[Data Science] From Zero to “Ah-ha!”: A Practical MLflow Tutorial (with Bite-Sized Examples)

Last updated on 07 Oct 2025

This post is a hands-on, copy-pasteable guide for turning your experiments into clean, queryable, and reproducible assets with MLflow.
We’ll start small, then layer on features—params, metrics, artifacts, datasets, model signatures, the Model Registry, and serving.

0) One-time setup

Install the basics (per project virtualenv recommended):

pip install -U mlflow scikit-learn pandas matplotlib pyarrow

Point your code at your tracking server (replace with your URL):

export MLFLOW_TRACKING_URI="http://<your-vps>:5000"
# If you enabled basic auth:
# export MLFLOW_TRACKING_USERNAME="user"
# export MLFLOW_TRACKING_PASSWORD="pass"

Quick sanity check (should return the server URI, not file:///...):

1) The tiniest experiment: params, metrics, artifact

Goal: create a run that records a parameter, a metric series, and a small file.

# 01_minimal.py
import time
from pathlib import Path
import mlflow

mlflow.set_experiment("blog-mlflow-basics")

with mlflow.start_run(run_name=f"hello-{int(time.time())}"):
    mlflow.log_param("model_family", "baseline")
    for step, val in enumerate([0.71, 0.74, 0.76, 0.78]):
        mlflow.log_metric("accuracy", val, step=step)
        time.sleep(0.05)

    Path("artifacts").mkdir(exist_ok=True)
    Path("artifacts/notes.txt").write_text("First run ✅\n")
    mlflow.log_artifact("artifacts/notes.txt", artifact_path="notes")

    print("Run:", mlflow.active_run().info.run_id)

Check in the UI: Experiments → blog-mlflow-basics → (your run)
You should see Parameters, Metrics (with a line chart), and Artifacts (notes/notes.txt).

2) A real model: sklearn + confusion matrix figure

Goal: train, log params/metrics, and store a figure as artifact.

# 02_sklearn_and_figure.py
import mlflow, pandas as pd, matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, ConfusionMatrixDisplay
from sklearn.ensemble import RandomForestClassifier
from pathlib import Path

mlflow.set_experiment("blog-mlflow-basics")

iris = datasets.load_iris(as_frame=True)
X = iris.frame[iris.feature_names]
y = iris.frame["target"].astype(int)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)

params = {"n_estimators": 200, "max_depth": 3, "random_state": 42}

with mlflow.start_run(run_name="rf-iris"):
    mlflow.log_params(params)
    mlflow.set_tags({"dataset": "iris", "stage": "dev"})

    model = RandomForestClassifier(**params).fit(Xtr, ytr)
    preds = model.predict(Xte)
    acc = accuracy_score(yte, preds)
    mlflow.log_metric("accuracy", acc)

    # Log a figure
    fig_path = Path("artifacts/confusion.png")
    fig_path.parent.mkdir(exist_ok=True)
    disp = ConfusionMatrixDisplay.from_predictions(yte, preds)
    plt.tight_layout(); plt.savefig(fig_path, dpi=160); plt.close()
    mlflow.log_artifact(str(fig_path), artifact_path="plots")

UI tip: On the experiment page, tick multiple runs → Compare to see overlayed metric curves and param tables.

3) Log datasets + model signature + input example

Goal: make your run portable: someone else can load the model and know expected columns and types.

# 03_signature_and_datasets.py
import mlflow, pandas as pd
from mlflow.models.signature import infer_signature
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

mlflow.set_experiment("blog-mlflow-basics")

iris = datasets.load_iris(as_frame=True)
X = iris.frame[iris.feature_names]
y = iris.frame["target"].astype(int)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)

with mlflow.start_run(run_name="logreg-iris"):
    model = LogisticRegression(max_iter=1000).fit(Xtr, ytr)
    sig = infer_signature(Xtr, model.predict(Xtr))
    mlflow.sklearn.log_model(
        model, "model",
        signature=sig,
        input_example=Xtr.head(3)
    )

    # Log datasets (best-effort: works on MLflow 3.x)
    try:
        from mlflow.data.pandas_dataset import from_pandas
        mlflow.log_input(from_pandas(Xtr.join(ytr.rename("target")), name="iris_train"), context="training")
        mlflow.log_input(from_pandas(Xte.join(yte.rename("target")), name="iris_test"), context="testing")
    except Exception as e:
        mlflow.log_text(str(e), "logs/datasets_warning.txt")

Why it matters: The signature is validated at load/serve time, preventing silent schema drift.

4) Autologging (one line, lots of value)

Goal: get params, metrics, and the model logged automatically.

# 04_autolog.py
import mlflow
mlflow.set_experiment("blog-mlflow-autolog")

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

mlflow.autolog()  # <-- magic line (works with many frameworks)

X, y = datasets.load_breast_cancer(return_X_y=True, as_frame=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.25, random_state=17, stratify=y)

with mlflow.start_run(run_name="gb-autolog"):
    m = GradientBoostingClassifier().fit(Xtr, ytr)
    mlflow.log_metric("holdout_acc", accuracy_score(yte, m.predict(Xte)))

Autologging is great, but explicitly logging key artifacts/plots is still a good habit.

5) Register a model and manage stages

Goal: turn a one-off model artifact into a versioned, named asset with stages (Staging, Production, Archived).

# 05_register_and_promote.py
import mlflow
from mlflow import MlflowClient
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier

mlflow.set_experiment("blog-mlflow-registry")

X, y = datasets.load_breast_cancer(return_X_y=True)
with mlflow.start_run(run_name="rf-register") as r:
    m = RandomForestClassifier(n_estimators=300, random_state=7).fit(X, y)
    mlflow.sklearn.log_model(m, "model")

    client = MlflowClient()
    mv = client.create_model_version(
        name="breast_cancer_rf",
        source=f"{r.info.artifact_uri}/model",
        run_id=r.info.run_id
    )
    print("Created version:", mv.version)

    client.transition_model_version_stage(
        name="breast_cancer_rf", version=mv.version, stage="Staging"
    )
    print("Promoted to Staging.")

Load by name+stage (anywhere):

import mlflow.pyfunc
mlflow.set_tracking_uri("http://<your-vps>:5000")
model = mlflow.pyfunc.load_model("models:/breast_cancer_rf/Staging")
print(model.predict([[14.0]*30])[:1])

6) Serve the production model as a REST API

Goal: ship the current Production model without writing a new server.

mlflow models serve -m "models:/breast_cancer_rf/Production" -p 8000 --host 0.0.0.0

Curl test:

curl -X POST http://127.0.0.1:8000/invocations \
  -H "Content-Type: application/json" \
  -d '{"inputs": [[14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14]]}'

7) Compare runs and pick winners (programmatically)

Goal: query your history with filters and order by metrics.

# 07_query_runs.py
import mlflow
from mlflow.entities import ViewType

mlflow.set_experiment("blog-mlflow-basics")
exp = mlflow.get_experiment_by_name("blog-mlflow-basics")

df = mlflow.search_runs(
    experiment_ids=[exp.experiment_id],
    filter_string='metrics.accuracy > 0.75 and tags.dataset = "iris"',
    order_by=["metrics.accuracy DESC"],
    output_format="pandas",
    max_results=20,
    run_view_type=ViewType.ACTIVE_ONLY,
)
print(df[["run_id", "metrics.accuracy", "params.n_estimators", "tags.dataset"]])

In the UI, use the search bar with the same filter grammar.

8) Nested runs for multi-step pipelines

Goal: keep each phase (prep → train → eval) separate but linked.

# 08_nested_runs.py
import mlflow, time

mlflow.set_experiment("blog-mlflow-nested")

with mlflow.start_run(run_name="pipeline"):
    mlflow.set_tag("pipeline", "prep-train-eval")

    with mlflow.start_run(run_name="prep", nested=True):
        time.sleep(0.1)
        mlflow.log_metric("rows_kept", 980)

    with mlflow.start_run(run_name="train", nested=True):
        mlflow.log_param("lr", 1e-3)
        mlflow.log_metric("train_loss", 0.12)

    with mlflow.start_run(run_name="eval", nested=True):
        mlflow.log_metric("val_auc", 0.91)

UI: The parent run shows Children; click through to see step-level artifacts and metrics.

9) Reproducibility: pin code & environment

Goal: capture code version and the Python env that produced the model.

# 09_reproducibility.py
import os, subprocess, mlflow, json, sys

mlflow.set_experiment("blog-mlflow-repro")

def git_commit():
    try:
        return subprocess.check_output(["git", "rev-parse", "HEAD"]).decode().strip()
    except Exception:
        return "unknown"

with mlflow.start_run(run_name="env-and-code"):
    mlflow.set_tag("git_commit", git_commit())

    # Log a simple requirements snapshot
    reqs = subprocess.check_output([sys.executable, "-m", "pip", "freeze"]).decode()
    mlflow.log_text(reqs, "env/requirements.txt")

    # Optional: log a minimal conda env file
    conda_env = {
        "name": "mlflow-env",
        "channels": ["conda-forge"],
        "dependencies": ["python={}".format(".".join(map(str, sys.version_info[:3]))), "pip", {"pip": ["mlflow"]}],
    }
    mlflow.log_text(json.dumps(conda_env, indent=2), "env/conda_env.json")

Tip: mlflow.<flavor>.log_model(..., pip_requirements=[...], conda_env=...) lets you embed env specs inside the model artifact.

10) Model evaluation (one-liner)

Goal: log a battery of metrics/plots for classification problems.

# 10_evaluate.py
import mlflow
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X, y = load_breast_cancer(return_X_y=True, as_frame=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.25, random_state=0, stratify=y)

with mlflow.start_run(run_name="evaluate-example"):
    m = LogisticRegression(max_iter=2000).fit(Xtr, ytr)
    res = mlflow.evaluate(
        model=m,
        data=Xte.assign(label=yte),
        targets="label",
        model_type="classifier",
        evaluators=["default"],  # logs metrics + confusion matrix + ROC, etc.
    )
    print(res.metrics.keys())

11) Serving-friendly signatures: strict schemas

Goal: fail fast if inputs are wrong at serving time.

# 11_strict_signature.py
import mlflow
from mlflow.models.signature import infer_signature
from sklearn.linear_model import LinearRegression
import pandas as pd

X = pd.DataFrame({"x1":[1,2,3], "x2":[0.1,0.2,0.3]})
y = pd.Series([3.2, 5.1, 7.0])

with mlflow.start_run(run_name="strict-sig"):
    m = LinearRegression().fit(X, y)
    sig = infer_signature(X, m.predict(X))
    mlflow.sklearn.log_model(m, "model", signature=sig, input_example=X.head(2))

When you later call the REST API with wrong columns/types, MLflow will reject the request.

12) Tagging conventions that pay off later

Use consistent tags to supercharge search & dashboards:

project, dataset, owner, stage (dev, abtest, prod-candidate)
git_commit, git_branch, feature_flags
training_job_id, ml_platform (e.g., “k8s”, “ray”, “sagemaker”)

mlflow.set_tags({
  "project": "sensor-forecast",
  "dataset": "v2025-10-01",
  "owner": "you",
  "stage": "dev",
})

13) A quick “gotchas” checklist

Artifacts are file:///... and fail to write?
Your server likely isn’t serving artifacts. Start it with:
- MLflow 3.x: --serve-artifacts --artifacts-destination file:///path (or S3/MinIO)
- MLflow 2.x: set --default-artifact-root s3://bucket/... (no HTTP artifact serving)
Shadowed import: An AttributeError: partially initialized module 'mlflow'... usually means you have a local mlflow.py file. Rename it and restart the kernel.
Conda vs pip mix: Prefer one channel per env; if mixing, install heavy deps (numpy/scipy) via conda first, then pip install mlflow.

Tracking URI confusion: Print it before you run:

print(mlflow.get_tracking_uri())

Make sure it’s http(s)://..., not file:///....

14) A tiny template you can reuse

# template_train.py
import os, time, mlflow
from typing import Dict

MLFLOW_URI = os.getenv("MLFLOW_TRACKING_URI", "http://<your-vps>:5000")
EXPERIMENT = os.getenv("MLFLOW_EXPERIMENT", "my-project")

def train_and_log(params: Dict[str, float]) -> str:
    mlflow.set_tracking_uri(MLFLOW_URI)
    mlflow.set_experiment(EXPERIMENT)

    with mlflow.start_run(run_name=f"train-{int(time.time())}") as run:
        mlflow.log_params(params)
        # ... train ...
        mlflow.log_metric("metric@final", 0.123)
        # mlflow.log_artifact("path/to/plot.png", artifact_path="plots")
        return run.info.run_id

if __name__ == "__main__":
    run_id = train_and_log({"learning_rate": 3e-4, "batch_size": 64})
    print("Logged run:", run_id)

Wrap-up

You now have everything to:

Track parameters, metrics, figures, tables, and datasets
Save models with signatures and examples
Register versions, promote to stages, and serve over REST
Query and compare runs at scale
Keep experiments reproducible and discoverable