Model Logging Made Easy: MLflow vs. Pickle

Using MLflow to log models offers distinct advantages over Pickle. Here’s a detailed look at the benefits:

Managing Library Versions


Different machine learning models may depend on various versions of the same library, leading to compatibility conflicts. Manually tracking and configuring the correct environment for each model can be tedious and error-prone.


MLflow automatically logs all dependencies, allowing users to easily recreate the exact environment necessary to run the model. This feature simplifies deployment and enhances reproducibility.

Documenting Inputs and Outputs


The expected inputs and outputs of a model are often not well-documented, making it challenging for others to utilize the model correctly.


MLflow defines a clear schema for inputs and outputs, ensuring that users know precisely what data to provide and what to expect in return. This clarity fosters better collaboration and reduces confusion.

Example Implementation

To demonstrate the advantages of MLflow, let’s implement a simple logistic regression model and log it.

Logging the Model

import mlflow
from mlflow.models import infer_signature
import numpy as np
from sklearn.linear_model import LogisticRegression
with mlflow.start_run():
    X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
    y = np.array([0, 0, 1, 1, 1, 0])
    lr = LogisticRegression(), y)
    signature = infer_signature(X, lr.predict(X))
    model_info = mlflow.sklearn.log_model(
        sk_model=lr, artifact_path="model", signature=signature
    print(f"Saving data to {model_info.model_uri}")

This code will output the location where the model is saved:

Saving data to runs:/f8b0fc900aa14cf0ade8d0165c5a9f11/model

Using the Logged Model

To use the logged model later, you can load it with the model_uri:

import mlflow
import numpy as np
model_uri = "runs:/1e20d72afccf450faa3b8a9806a97e83/model"
sklearn_pyfunc = mlflow.pyfunc.load_model(model_uri=model_uri)
data = np.array([-4, 1, 0, 10, -2, 1]).reshape(-1, 1)
predictions = sklearn_pyfunc.predict(data)

Inspecting Model Artifacts

Check the saved artifacts in mlruns/0/1e20d72afccf450faa3b8a9806a97e83/artifacts/model:

    MLmodel           model.pkl         requirements.txt
    conda.yaml        python_env.yaml

Understanding Model Configuration

The MLmodel file contains critical information about the model, including its dependencies and input/output specifications:

artifact_path: model
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.11.6
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.4.1.post1
mlflow_version: 2.15.0
model_size_bytes: 722
model_uuid: e7487bc3c4ab417c965144efcecaca2f
run_id: 1e20d72afccf450faa3b8a9806a97e83
  inputs: '[{"type": "tensor", "tensor-spec": {"dtype": "int64", "shape": [-1, 1]}}]'
  outputs: '[{"type": "tensor", "tensor-spec": {"dtype": "int64", "shape": [-1]}}]'
  params: null
utc_time_created: '2024-08-02 20:58:16.516963'

Environment Specifications

The conda.yaml and python_env.yaml files detail the dependencies required for the model, ensuring a consistent runtime environment. Here’s a look at conda.yaml:

# conda.yaml
- conda-forge
- python=3.11.6
- pip<=24.2
- pip:
  - mlflow==2.15.0
  - cloudpickle==2.2.1
  - numpy==1.23.5
  - psutil==5.9.6
  - scikit-learn==1.4.1.post1
  - scipy==1.11.3
name: mlflow-env

And for python_env.yaml and requirements.txt:

# python_env.yaml
python: 3.11.6
- pip==24.2
- setuptools
- wheel==0.40.0
- -r requirements.txt
# requirements.txt

