Why MLflow for LLMs
MLflow was built for traditional ML experiment tracking (hyperparameters, metrics, artifacts). In version 2.x it added first-class LLM support: automatic tracing of LLM calls, prompt versioning, and a model registry that understands fine-tuned checkpoints. If your team already uses MLflow for ML, adding LLM observability is a natural extension.
Setup
pip install mlflow openai
mlflow server --host 0.0.0.0 --port 5000
Automatic LLM Tracing
MLflow 2.x auto-patches OpenAI, LangChain, and LlamaIndex:
import mlflow
import openai
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("llm-experiments")
# Enable autologging — automatically traces all OpenAI calls
mlflow.openai.autolog()
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain HNSW indexing"}],
)
Every call appears as a trace in the MLflow UI with input, output, model name, token counts, and latency.
Manual Logging With Runs
with mlflow.start_run(run_name="rag-v2-eval"):
mlflow.log_param("model", "gpt-4o-mini")
mlflow.log_param("chunk_size", 512)
mlflow.log_param("retrieval_k", 4)
# Run your RAG evaluation
scores = evaluate_rag_pipeline()
mlflow.log_metric("faithfulness", scores["faithfulness"])
mlflow.log_metric("answer_relevancy", scores["answer_relevancy"])
mlflow.log_metric("latency_p95_ms", scores["p95_latency"])
Compare run metrics in the MLflow UI to see which prompt or chunking strategy performs best.
MLflow Tracing for LLM Chains
Decorate any function to create a span:
@mlflow.trace(span_type="CHAIN")
def run_rag(question: str) -> str:
docs = retrieve_docs(question)
return generate_answer(docs, question)
@mlflow.trace(span_type="RETRIEVER")
def retrieve_docs(query: str) -> list[str]:
# vector search
return results
@mlflow.trace(span_type="LLM")
def generate_answer(docs: list[str], question: str) -> str:
# LLM call
return answer
Model Registry for Fine-Tuned Models
Log a fine-tuned model checkpoint as a registered model:
with mlflow.start_run():
mlflow.log_artifact("./fine-tuned-llama-3.1-8b/", artifact_path="model")
mlflow.register_model(
model_uri=f"runs:/{mlflow.active_run().info.run_id}/model",
name="llama-3.1-8b-customer-support",
)
Promote versions through Staging → Production using the UI or API:
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="llama-3.1-8b-customer-support",
version=3,
stage="Production",
)
LangChain Integration
pip install mlflow langchain-openai
mlflow.langchain.autolog()
# All LangChain chain/agent calls are automatically traced
Full documentation at mlflow.org/docs/latest/llms and the GitHub repo.