Model Logging Made Easy: MLflow vs. Pickle

Using MLflow to log models offers distinct advantages over Pickle. Here’s a detailed look at the benefits:

Managing Library Versions

Problem

Different machine learning models may depend on various versions of the same library, leading to compatibility conflicts. Manually tracking and configuring the correct environment for each model can be tedious and error-prone.

Solution

MLflow automatically logs all dependencies, allowing users to easily recreate the exact environment necessary to run the model. This feature simplifies deployment and enhances reproducibility.

Documenting Inputs and Outputs

Problem

The expected inputs and outputs of a model are often not well-documented, making it challenging for others to utilize the model correctly.

Solution

MLflow defines a clear schema for inputs and outputs, ensuring that users know precisely what data to provide and what to expect in return. This clarity fosters better collaboration and reduces confusion.

Example Implementation

To demonstrate the advantages of MLflow, let’s implement a simple logistic regression model and log it.

Logging the Model

import mlflow
from mlflow.models import infer_signature
import numpy as np
from sklearn.linear_model import LogisticRegression

with mlflow.start_run():
    X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
    y = np.array([0, 0, 1, 1, 1, 0])
    lr = LogisticRegression()
    lr.fit(X, y)
    signature = infer_signature(X, lr.predict(X))

    model_info = mlflow.sklearn.log_model(
        sk_model=lr, artifact_path="model", signature=signature
    )

    print(f"Saving data to {model_info.model_uri}")

This code will output the location where the model is saved:

Saving data to runs:/f8b0fc900aa14cf0ade8d0165c5a9f11/model

Using the Logged Model

To use the logged model later, you can load it with the model_uri:

import mlflow
import numpy as np

model_uri = "runs:/1e20d72afccf450faa3b8a9806a97e83/model"
sklearn_pyfunc = mlflow.pyfunc.load_model(model_uri=model_uri)

data = np.array([-4, 1, 0, 10, -2, 1]).reshape(-1, 1)
predictions = sklearn_pyfunc.predict(data)

Inspecting Model Artifacts

Check the saved artifacts in mlruns/0/1e20d72afccf450faa3b8a9806a97e83/artifacts/model:

    MLmodel           model.pkl         requirements.txt
    conda.yaml        python_env.yaml

Understanding Model Configuration

The MLmodel file contains critical information about the model, including its dependencies and input/output specifications:

artifact_path: model
flavors:
  python_function:
    env:
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.11.6
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.4.1.post1
mlflow_version: 2.15.0
model_size_bytes: 722
model_uuid: e7487bc3c4ab417c965144efcecaca2f
run_id: 1e20d72afccf450faa3b8a9806a97e83
signature:
  inputs: '[{"type": "tensor", "tensor-spec": {"dtype": "int64", "shape": [-1, 1]}}]'
  outputs: '[{"type": "tensor", "tensor-spec": {"dtype": "int64", "shape": [-1]}}]'
  params: null
utc_time_created: '2024-08-02 20:58:16.516963'

Environment Specifications

The conda.yaml and python_env.yaml files detail the dependencies required for the model, ensuring a consistent runtime environment. Here’s a look at conda.yaml:

# conda.yaml
channels:
- conda-forge
dependencies:
- python=3.11.6
- pip<=24.2
- pip:
  - mlflow==2.15.0
  - cloudpickle==2.2.1
  - numpy==1.23.5
  - psutil==5.9.6
  - scikit-learn==1.4.1.post1
  - scipy==1.11.3
name: mlflow-env

And for python_env.yaml and requirements.txt:

# python_env.yaml
python: 3.11.6
build_dependencies:
- pip==24.2
- setuptools
- wheel==0.40.0
dependencies:
- -r requirements.txt

# requirements.txt
mlflow==2.15.0
cloudpickle==2.2.1
numpy==1.23.5
psutil==5.9.6
scikit-learn==1.4.1.post1
scipy==1.11.3

Learn more about MLFlow Models.

Search

Machine Learning

Model Logging Made Easy: MLflow vs. Pickle

Model Logging Made Easy: MLflow vs. Pickle

Managing Library Versions

Problem

Solution

Documenting Inputs and Outputs

Problem

Solution

Example Implementation

Logging the Model

Using the Logged Model

Inspecting Model Artifacts

Understanding Model Configuration

Environment Specifications

Search

Related Posts

PyOD: Simplifying Outlier Detection in Python

How to Build a Recommendation Engine Using Surprise in Python

Sparrow: Document Processing Made Simple

Related Posts

Hydra: YAML-Based Config Management Made Simple

TimberTrek: Create an Interactive and Comprehensive Decision Tree

Rapid Prototyping and Comparison of Basic Models with Lazy Predict

Stay up-to-date with
data skills using
CodeCut

Drop a line

Get in touch

Follow Us on Social Media

Model Logging Made Easy: MLflow vs. Pickle

Model Logging Made Easy: MLflow vs. Pickle

Managing Library Versions

Problem

Solution

Documenting Inputs and Outputs

Problem

Solution

Example Implementation

Logging the Model

Using the Logged Model

Inspecting Model Artifacts

Understanding Model Configuration

Environment Specifications

Search

Related Posts

Related Posts

Stay up-to-date with data skills using CodeCut

Follow Us on Social Media

Work with Khuyen Tran

Work with Khuyen Tran

Stay up-to-date with
data skills using
CodeCut