Langfuse: Open-Source LLM Observability You Can Self-Host

Langfuse brings full tracing, prompt versioning, dataset evaluation, and cost attribution to LLM apps - and you can run the entire stack on your own servers.

Mahmudul Haque Qudrati

CEO & ML Engineer

March 26, 2026

8 min read

// tags

#langfuse#observability#tracing#open-source#self-hosted

FIG. ART-32

8 min read

“

Langfuse: Open-Source LLM Observability You Can Self-Host

// reading plan

sections

381

words

min read

// AI Agents

Building reliable agentic AI systems: A Practical Overview

A practical guide to building reliable agentic AI systems covering structured outputs, observability, fallbacks, and cost controls with real code examples.

4 min read

// Developer Tools

What is SpaceX Is Buying Cursor? A Practical Overview

Python SDK Integration

pip install langfuse

Decorator-Based Tracing

from langfuse.decorators import observe, langfuse_context
from openai import OpenAI

client = OpenAI()

@observe()
def retrieve_docs(query: str) -> str:
    # Simulate vector retrieval
    return f"Documents for: {query}"

@observe()
def generate_answer(docs: str, question: str) -> str:
    langfuse_context.update_current_observation(
        input={"docs": docs, "question": question}
    )
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Context: {docs}"},
            {"role": "user", "content": question},
        ],
    )
    answer = response.choices[0].message.content
    langfuse_context.update_current_observation(output=answer)
    return answer

@observe()
def answer_question(question: str) -> str:
    docs = retrieve_docs(question)
    return generate_answer(docs, question)

result = answer_question("What is PagedAttention?")

Every @observe() call automatically creates a span in the trace. Langfuse captures function arguments as input and return values as output.

Prompt Management With Versioning

Store, version, and A/B test prompts in the Langfuse UI:

from langfuse import Langfuse

lf = Langfuse()
prompt = lf.get_prompt("answer-question", version=3)
compiled = prompt.compile(context="...", question="...")

Changing a prompt in production no longer requires a code deploy.

Dataset Creation From Production Traces

Mark any trace as a dataset item directly from the UI. Build ground-truth datasets from real user interactions, then run batch evaluations to compare prompt versions or models.

LLM-as-Judge Scoring

from langfuse import Langfuse

lf = Langfuse()
lf.score(
    trace_id="trace-xyz",
    name="faithfulness",
    value=0.92,
    comment="Answer matches source documents",
)

Automate this with an evaluator function that runs after each generation.

Self-Hosting With Docker

git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up -d

The stack includes PostgreSQL, Redis, and the Langfuse web server. Full instructions in the self-host guide.

User-Level Cost Attribution

Pass user_id to attribute token costs to individual users - essential for multi-tenant SaaS billing:

langfuse_context.update_current_trace(user_id="user-456", session_id="session-789")

The dashboard then shows cost-per-user histograms and per-session token usage.

Langfuse: Open-Source LLM Observability You Can Self-Host

Related Articles

Building reliable agentic AI systems: A Practical Overview

What is SpaceX Is Buying Cursor? A Practical Overview

The Observability Gap in LLM Apps

Core Hierarchy

Python SDK Integration

Decorator-Based Tracing

Prompt Management With Versioning

Dataset Creation From Production Traces

LLM-as-Judge Scoring

Self-Hosting With Docker

User-Level Cost Attribution

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

Langfuse: Open-Source LLM Observability You Can Self-Host

Related Articles

Building reliable agentic AI systems: A Practical Overview

What is SpaceX Is Buying Cursor? A Practical Overview

The Observability Gap in LLM Apps

Core Hierarchy

Python SDK Integration

Decorator-Based Tracing

Prompt Management With Versioning

Dataset Creation From Production Traces

LLM-as-Judge Scoring

Self-Hosting With Docker

User-Level Cost Attribution

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

The workspace your team
actually needs