MLflow is an open source platform for managing the ML lifecycle: experiment tracking (logging parameters, metrics, and artifacts during training), model registry (versioning production models), and model serving. For most ML teams, MLflow's experiment tracking covers 80% of what they need from Weights & Biases at zero marginal cost, since MLflow is fully self-hosted and free. The key trade-offs: MLflow requires you to manage infrastructure (a tracking server and artifact store), whereas W&B is managed and works out of the box. MLflow's UI is functional but less polished than W&B. For teams that can tolerate a bit of infrastructure management and want to avoid per-seat SaaS pricing, MLflow is the right choice.
Setup in 5 Lines
The simplest MLflow setup requires no separate server. MLflow logs locally to an ./mlruns directory:
import mlflow
mlflow.set_experiment("my-experiment")
with mlflow.start_run():
mlflow.log_param("learning_rate", 0.001)
mlflow.log_param("batch_size", 32)
mlflow.log_metric("train_loss", 0.45)
mlflow.log_metric("val_accuracy", 0.87)
mlflow.log_artifact("model.pkl")
Start the UI to view logged experiments:
mlflow ui
# Opens at http://localhost:5000
That is the complete local setup. No server, no cloud account, no configuration files.
Running a Tracking Server for Teams
For team use, you need a shared tracking server so multiple people can log to the same backend. The simplest production setup uses a PostgreSQL database and an S3-compatible artifact store:
mlflow server --backend-store-uri postgresql://user:password@localhost/mlflow --default-artifact-root s3://your-bucket/mlflow-artifacts --host 0.0.0.0 --port 5000
Then point your code to the tracking server:
import mlflow
mlflow.set_tracking_uri("http://your-mlflow-server:5000")
mlflow.set_experiment("team-experiment")
For small teams (2-5 people), running MLflow on a single VM (2 vCPU, 4GB RAM is sufficient) with S3 or MinIO for artifact storage works well. The infrastructure cost is $20-50/month versus $100-500/month for W&B Team.
Integration with Training Loops
MLflow's autolog feature integrates automatically with common ML frameworks:
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
mlflow.sklearn.autolog() # Automatically logs params, metrics, and model
with mlflow.start_run():
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
# mlflow automatically logged: n_estimators, max_depth, accuracy, etc.
Autolog support: scikit-learn, PyTorch, TensorFlow/Keras, XGBoost, LightGBM, Statsmodels, and more.
For LLM fine-tuning with Hugging Face Transformers:
import mlflow
mlflow.set_experiment("llm-fine-tuning")
with mlflow.start_run() as run:
mlflow.log_params({
"model_name": "meta-llama/Meta-Llama-3-8B-Instruct",
"lora_rank": 16,
"learning_rate": 2e-4,
"num_epochs": 3,
"dataset_size": len(train_dataset)
})
# Training loop...
for epoch, metrics in enumerate(training_metrics):
mlflow.log_metrics({
"train_loss": metrics["loss"],
"eval_perplexity": metrics["perplexity"]
}, step=epoch)
# Log the LoRA adapter as an artifact
mlflow.log_artifacts("./outputs/lora-adapter")
The Model Registry
MLflow's Model Registry versions production models and tracks their lifecycle (Staging, Production, Archived).
Registering a model:
with mlflow.start_run() as run:
# ... training ...
mlflow.sklearn.log_model(
clf,
"model",
registered_model_name="customer-churn-classifier"
)
Transitioning to production:
from mlflow import MlflowClient
client = MlflowClient()
client.transition_model_version_stage(
name="customer-churn-classifier",
version=3,
stage="Production"
)
Loading the production model:
model = mlflow.sklearn.load_model("models:/customer-churn-classifier/Production")
The registry provides: version history, stage management, annotations, and a registry UI.
MLflow vs Weights & Biases
| Feature | MLflow | W&B | |---------|--------|-----| | Cost | Free (self-hosted) | $0/month (personal), $150+/month (team) | | Setup | Requires infrastructure | Managed, works instantly | | UI quality | Functional | Polished, better visualizations | | Integrations | Good | Better (more native integrations) | | Collaboration | Works on shared server | Built-in | | Sweeps (HP tuning) | Basic | Excellent | | Data viz | Basic | Rich |
When MLflow is enough:
- Small teams (1-5 people) comfortable managing a server
- Cost-sensitive projects
- Workloads where basic logging and model registry cover your needs
- Teams that want to keep training data and model artifacts on their own infrastructure
When to choose W&B:
- You want zero infrastructure management
- Your team needs collaborative experiment comparison across many runs
- You run hyperparameter sweeps and want W&B's sweep visualization
- You value the polished UI and would pay for the time saved
Keep Reading
- Fine-Tuning an LLM with QLoRA — MLflow tracking for LLM fine-tuning
- Hugging Face Complete Guide — Hosting fine-tuned models after tracking experiments with MLflow
- Open Source LLM Benchmarks 2026 — Evaluating models whose training you tracked with MLflow
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.