Pristren

MLflow for Experiment Tracking: Setup, Usage, and When It Is Enough | Pristren Blog

// article

share

MLflow is an open source platform for managing the ML lifecycle: experiment tracking (logging parameters, metrics, and artifacts during training), model registry (versioning production models), and model serving. For most ML teams, MLflow's experiment tracking covers 80% of what they need from Weights & Biases at zero marginal cost, since MLflow is fully self-hosted and free. The key trade-offs: MLflow requires you to manage infrastructure (a tracking server and artifact store), whereas W&B is managed and works out of the box. MLflow's UI is functional but less polished than W&B. For teams that can tolerate a bit of infrastructure management and want to avoid per-seat SaaS pricing, MLflow is the right choice.

Setup in 5 Lines

The simplest MLflow setup requires no separate server. MLflow logs locally to an ./mlruns directory:

import mlflow

mlflow.set_experiment("my-experiment")

with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.001)
    mlflow.log_param("batch_size", 32)
    mlflow.log_metric("train_loss", 0.45)
    mlflow.log_metric("val_accuracy", 0.87)
    mlflow.log_artifact("model.pkl")

Start the UI to view logged experiments:

mlflow ui
# Opens at http://localhost:5000

That is the complete local setup. No server, no cloud account, no configuration files.

Running a Tracking Server for Teams

For team use, you need a shared tracking server so multiple people can log to the same backend. The simplest production setup uses a PostgreSQL database and an S3-compatible artifact store:

mlflow server   --backend-store-uri postgresql://user:password@localhost/mlflow   --default-artifact-root s3://your-bucket/mlflow-artifacts   --host 0.0.0.0   --port 5000

Then point your code to the tracking server:

import mlflow

mlflow.set_tracking_uri("http://your-mlflow-server:5000")
mlflow.set_experiment("team-experiment")

For small teams (2-5 people), running MLflow on a single VM (2 vCPU, 4GB RAM is sufficient) with S3 or MinIO for artifact storage works well. The infrastructure cost is $20-50/month versus $100-500/month for W&B Team.

Integration with Training Loops

MLflow's autolog feature integrates automatically with common ML frameworks:

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier

mlflow.sklearn.autolog()  # Automatically logs params, metrics, and model

with mlflow.start_run():
    clf = RandomForestClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    # mlflow automatically logged: n_estimators, max_depth, accuracy, etc.

Autolog support: scikit-learn, PyTorch, TensorFlow/Keras, XGBoost, LightGBM, Statsmodels, and more.

For LLM fine-tuning with Hugging Face Transformers:

import mlflow

mlflow.set_experiment("llm-fine-tuning")

with mlflow.start_run() as run:
    mlflow.log_params({
        "model_name": "meta-llama/Meta-Llama-3-8B-Instruct",
        "lora_rank": 16,
        "learning_rate": 2e-4,
        "num_epochs": 3,
        "dataset_size": len(train_dataset)
    })

    # Training loop...
    for epoch, metrics in enumerate(training_metrics):
        mlflow.log_metrics({
            "train_loss": metrics["loss"],
            "eval_perplexity": metrics["perplexity"]
        }, step=epoch)

    # Log the LoRA adapter as an artifact
    mlflow.log_artifacts("./outputs/lora-adapter")

The Model Registry

MLflow's Model Registry versions production models and tracks their lifecycle (Staging, Production, Archived).

Registering a model:

with mlflow.start_run() as run:
    # ... training ...
    mlflow.sklearn.log_model(
        clf,
        "model",
        registered_model_name="customer-churn-classifier"
    )

Transitioning to production:

from mlflow import MlflowClient

client = MlflowClient()
client.transition_model_version_stage(
    name="customer-churn-classifier",
    version=3,
    stage="Production"
)

Loading the production model:

model = mlflow.sklearn.load_model("models:/customer-churn-classifier/Production")

The registry provides: version history, stage management, annotations, and a registry UI.

MLflow vs Weights & Biases

| Feature | MLflow | W&B | |---------|--------|-----| | Cost | Free (self-hosted) | $0/month (personal), $150+/month (team) | | Setup | Requires infrastructure | Managed, works instantly | | UI quality | Functional | Polished, better visualizations | | Integrations | Good | Better (more native integrations) | | Collaboration | Works on shared server | Built-in | | Sweeps (HP tuning) | Basic | Excellent | | Data viz | Basic | Rich |

When MLflow is enough:

Small teams (1-5 people) comfortable managing a server
Cost-sensitive projects
Workloads where basic logging and model registry cover your needs
Teams that want to keep training data and model artifacts on their own infrastructure

When to choose W&B:

You want zero infrastructure management
Your team needs collaborative experiment comparison across many runs
You run hyperparameter sweeps and want W&B's sweep visualization
You value the polished UI and would pay for the time saved

Keep Reading

Fine-Tuning an LLM with QLoRA — MLflow tracking for LLM fine-tuning
Hugging Face Complete Guide — Hosting fine-tuned models after tracking experiments with MLflow
Open Source LLM Benchmarks 2026 — Evaluating models whose training you tracked with MLflow

Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.

// end of articleback to blog

MLflow for Experiment Tracking: Setup, Usage, and When It Is Enough

Related Articles

The ML Tools Ecosystem in 2026: A Map of What Is Worth Knowing

Setup in 5 Lines

Running a Tracking Server for Teams

Integration with Training Loops

The Model Registry

MLflow vs Weights & Biases

Keep Reading

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Feature Engineering: The Practical Guide to Transforming Raw Data into ML Inputs

Supervised Learning Explained: How Models Learn from Labeled Examples

MLflow for Experiment Tracking: Setup, Usage, and When It Is Enough

Related Articles

The ML Tools Ecosystem in 2026: A Map of What Is Worth Knowing

Setup in 5 Lines

Running a Tracking Server for Teams

Integration with Training Loops

The Model Registry

MLflow vs Weights & Biases

Keep Reading

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Feature Engineering: The Practical Guide to Transforming Raw Data into ML Inputs

Supervised Learning Explained: How Models Learn from Labeled Examples

The workspace your team
actually needs