DSPy: Automatic Prompt Optimization for Complex LLM Pipelines

DSPy optimizes LLM prompts automatically using your data. Here is when it helps, when it does not, and a complete setup guide for a real use case.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 17, 2026

9 min read

// tags

#dspy#prompt-optimization#llm#ai-frameworks

FIG. ART-26

9 min read

“

DSPy: Automatic Prompt Optimization for Complex LLM Pipelines

// reading plan

sections

857

words

min read

// AI Agents

Building reliable agentic AI systems: A Practical Overview

A practical guide to building reliable agentic AI systems covering structured outputs, observability, fallbacks, and cost controls with real code examples.

4 min read

// Open Source AI

OpenCode vs Claude Code: Open-Source Agentic CLI Compared

Complete Example: Optimizing an Entity Extractor

import dspy
from dspy.teleprompt import BootstrapFewShot

# Configure the LLM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Define signature
class ExtractEntities(dspy.Signature):
    """Extract named entities from text."""
    text: str = dspy.InputField()
    entities: list[str] = dspy.OutputField()

# Define module
class EntityExtractor(dspy.Module):
    def __init__(self):
        self.extractor = dspy.Predict(ExtractEntities)

    def forward(self, text):
        return self.extractor(text=text)

# Training data (labeled examples)
trainset = [
    dspy.Example(
        text="Apple CEO Tim Cook announced the new iPhone at WWDC in San Francisco.",
        entities=["Apple", "Tim Cook", "iPhone", "WWDC", "San Francisco"]
    ).with_inputs("text"),
    # ... more examples
]

# Define metric
def entity_f1(example, pred, trace=None):
    expected = set(example.entities)
    predicted = set(pred.entities) if isinstance(pred.entities, list) else set()
    if not predicted:
        return 0
    precision = len(expected & predicted) / len(predicted)
    recall = len(expected & predicted) / len(expected)
    return 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

# Optimize
optimizer = BootstrapFewShot(metric=entity_f1, max_labeled_demos=4)
extractor = EntityExtractor()
optimized = optimizer.compile(extractor, trainset=trainset)

# Use the optimized module
result = optimized(text="Tesla's Elon Musk unveiled new models in Austin, Texas.")
print(result.entities)

When DSPy Helps

Multi-step pipelines where each step depends on the previous one. A RAG system with query rewriting, retrieval, answer generation, and citation verification has four LLM steps. Prompt quality compounds across steps: errors in step 1 propagate through the pipeline. DSPy can jointly optimize all four prompts.

When you are switching between LLM providers. A pipeline optimized for GPT-4o degrades significantly on Mistral 7B. Re-running DSPy optimization with the new LLM takes hours and produces near-optimal prompts for that model. Manual prompt adaptation takes days.

When you have labeled data. DSPy requires training examples with ground truth labels to optimize against. If you have 50-500 labeled examples for your task, DSPy can use them. If you do not have labels, you need a different approach.

When DSPy Does Not Help

Simple single-step applications. If your application calls an LLM once to do a straightforward task, DSPy's overhead (learning the framework, setting up optimization runs, managing labeled datasets) rarely produces better results than a well-written manual prompt.

When you have no labeled data. DSPy requires labeled examples. If you do not have them, you either need to create them (expensive) or use a different optimization approach.

Real-time latency-sensitive applications. DSPy optimization runs take minutes to hours. The optimized program is fast to run, but the optimization process itself is not real-time.

When the task changes frequently. DSPy optimization produces prompts optimized for a fixed task. If your task definition changes often, re-running optimization frequently is impractical.

Keep Reading

LangChain vs LlamaIndex Comparison - Other frameworks for building LLM pipelines
Open Source LLM Benchmarks 2026 - The models DSPy can optimize prompts for
How Large Language Models Work - The underlying mechanics that explain why prompts matter

Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace - chat, projects, time tracking, AI meeting summaries, and invoicing - in one tool. Try it free.

DSPy: Automatic Prompt Optimization for Complex LLM Pipelines

Related Articles

Building reliable agentic AI systems: A Practical Overview

The Core Idea

Core Concepts

Complete Example: Optimizing an Entity Extractor

When DSPy Helps

When DSPy Does Not Help

Keep Reading

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

OpenCode vs Claude Code: Open-Source Agentic CLI Compared

DeepSeek V4 Pro and Kimi K2.6 vs Claude Opus 4.8: Open Weights at Frontier Level

DSPy: Automatic Prompt Optimization for Complex LLM Pipelines

Related Articles

Building reliable agentic AI systems: A Practical Overview

The Core Idea

Core Concepts

Complete Example: Optimizing an Entity Extractor

When DSPy Helps

When DSPy Does Not Help

Keep Reading

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

OpenCode vs Claude Code: Open-Source Agentic CLI Compared

DeepSeek V4 Pro and Kimi K2.6 vs Claude Opus 4.8: Open Weights at Frontier Level

The workspace your team
actually needs