DSPy (Declarative Self-improving Python) is a framework for building LLM-powered programs that optimizes prompts and few-shot examples automatically, rather than requiring you to hand-write them. Instead of crafting a prompt like "You are a helpful assistant that extracts named entities from text. Here are some examples:...", you define input and output signatures in Python and let DSPy find optimal prompts and examples using your data and a metric. DSPy is most valuable for complex multi-step LLM pipelines where prompt quality significantly impacts output quality and where you have labeled data to optimize against. For simple single-step applications (one LLM call with a straightforward task), the overhead of DSPy rarely justifies itself.
The Core Idea
Standard LLM development has a frustrating property: the best prompt for a task is highly sensitive to the specific model, task formulation, and examples you choose. Prompts that work well for GPT-4o often fail on Claude or Mistral. Prompts that work on a development set sometimes degrade on production traffic. Hand-tuning prompts is time-consuming and does not transfer across models.
DSPy's approach: treat prompts as hyperparameters to optimize rather than code to write. You specify:
- The signature (input fields and output fields with descriptions)
- The metric (how to evaluate if an output is good)
- The training data (labeled examples of good inputs and outputs)
DSPy's optimizer then searches for prompts and few-shot examples that maximize the metric on your training data.
Core Concepts
Signatures: Typed input/output declarations
import dspy
class ExtractEntities(dspy.Signature):
"""Extract named entities from text."""
text: str = dspy.InputField()
entities: list[str] = dspy.OutputField(desc="List of named entities (people, organizations, locations)")
Modules: LLM calls with signatures
class EntityExtractor(dspy.Module):
def __init__(self):
self.extractor = dspy.Predict(ExtractEntities)
def forward(self, text):
return self.extractor(text=text)
Optimizers: Find the best prompts
dspy.BootstrapFewShot: Simple few-shot example selectiondspy.BootstrapFewShotWithRandomSearch: Random search over few-shot examples (better quality, slower)dspy.MIPROv2: Multi-prompt optimization with Bayesian search (best quality, slowest)
Metrics: Evaluation functions
def entity_accuracy(example, prediction, trace=None):
expected = set(example.entities)
predicted = set(prediction.entities)
precision = len(expected & predicted) / len(predicted) if predicted else 0
recall = len(expected & predicted) / len(expected) if expected else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
return f1
Complete Example: Optimizing an Entity Extractor
import dspy
from dspy.teleprompt import BootstrapFewShot
# Configure the LLM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
# Define signature
class ExtractEntities(dspy.Signature):
"""Extract named entities from text."""
text: str = dspy.InputField()
entities: list[str] = dspy.OutputField()
# Define module
class EntityExtractor(dspy.Module):
def __init__(self):
self.extractor = dspy.Predict(ExtractEntities)
def forward(self, text):
return self.extractor(text=text)
# Training data (labeled examples)
trainset = [
dspy.Example(
text="Apple CEO Tim Cook announced the new iPhone at WWDC in San Francisco.",
entities=["Apple", "Tim Cook", "iPhone", "WWDC", "San Francisco"]
).with_inputs("text"),
# ... more examples
]
# Define metric
def entity_f1(example, pred, trace=None):
expected = set(example.entities)
predicted = set(pred.entities) if isinstance(pred.entities, list) else set()
if not predicted:
return 0
precision = len(expected & predicted) / len(predicted)
recall = len(expected & predicted) / len(expected)
return 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
# Optimize
optimizer = BootstrapFewShot(metric=entity_f1, max_labeled_demos=4)
extractor = EntityExtractor()
optimized = optimizer.compile(extractor, trainset=trainset)
# Use the optimized module
result = optimized(text="Tesla's Elon Musk unveiled new models in Austin, Texas.")
print(result.entities)
When DSPy Helps
Multi-step pipelines where each step depends on the previous one. A RAG system with query rewriting, retrieval, answer generation, and citation verification has four LLM steps. Prompt quality compounds across steps: errors in step 1 propagate through the pipeline. DSPy can jointly optimize all four prompts.
When you are switching between LLM providers. A pipeline optimized for GPT-4o degrades significantly on Mistral 7B. Re-running DSPy optimization with the new LLM takes hours and produces near-optimal prompts for that model. Manual prompt adaptation takes days.
When you have labeled data. DSPy requires training examples with ground truth labels to optimize against. If you have 50-500 labeled examples for your task, DSPy can use them. If you do not have labels, you need a different approach.
When DSPy Does Not Help
Simple single-step applications. If your application calls an LLM once to do a straightforward task, DSPy's overhead (learning the framework, setting up optimization runs, managing labeled datasets) rarely produces better results than a well-written manual prompt.
When you have no labeled data. DSPy requires labeled examples. If you do not have them, you either need to create them (expensive) or use a different optimization approach.
Real-time latency-sensitive applications. DSPy optimization runs take minutes to hours. The optimized program is fast to run, but the optimization process itself is not real-time.
When the task changes frequently. DSPy optimization produces prompts optimized for a fixed task. If your task definition changes often, re-running optimization frequently is impractical.
Keep Reading
- LangChain vs LlamaIndex Comparison — Other frameworks for building LLM pipelines
- Open Source LLM Benchmarks 2026 — The models DSPy can optimize prompts for
- How Large Language Models Work — The underlying mechanics that explain why prompts matter
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.