DSPy (Declarative Self-improving Python) is a framework for building LLM-powered programs that optimizes prompts and few-shot examples automatically, rather than requiring you to hand-write them. Instead of crafting a prompt like "You are a helpful assistant that extracts named entities from text. Here are some examples:...", you define input and output signatures in Python and let DSPy find optimal prompts and examples using your data and a metric. DSPy is most valuable for complex multi-step LLM pipelines where prompt quality significantly impacts output quality and where you have labeled data to optimize against. For simple single-step applications (one LLM call with a straightforward task), the overhead of DSPy rarely justifies itself.
The Core Idea
Standard LLM development has a frustrating property: the best prompt for a task is highly sensitive to the specific model, task formulation, and examples you choose. Prompts that work well for GPT-4o often fail on Claude or Mistral. Prompts that work on a development set sometimes degrade on production traffic. Hand-tuning prompts is time-consuming and does not transfer across models.
DSPy's approach: treat prompts as hyperparameters to optimize rather than code to write. You specify:
The signature (input fields and output fields with descriptions)
The metric (how to evaluate if an output is good)
The training data (labeled examples of good inputs and outputs)
DSPy's optimizer then searches for prompts and few-shot examples that maximize the metric on your training data.
Core Concepts
Signatures: Typed input/output declarations
import dspy
class ExtractEntities(dspy.Signature):
"""Extract named entities from text."""
text: str = dspy.InputField()
entities: list[str] = dspy.OutputField(desc="List of named entities (people, organizations, locations)")
import dspy
from dspy.teleprompt import BootstrapFewShot
# Configure the LLM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
# Define signature
class ExtractEntities(dspy.Signature):
"""Extract named entities from text."""
text: str = dspy.InputField()
entities: list[str] = dspy.OutputField()
# Define module
class EntityExtractor(dspy.Module):
def __init__(self):
self.extractor = dspy.Predict(ExtractEntities)
def forward(self, text):
return self.extractor(text=text)
# Training data (labeled examples)
trainset = [
dspy.Example(
text="Apple CEO Tim Cook announced the new iPhone at WWDC in San Francisco.",
entities=["Apple", "Tim Cook", "iPhone", "WWDC", "San Francisco"]
).with_inputs("text"),
# ... more examples
]
# Define metric
def entity_f1(example, pred, trace=None):
expected = set(example.entities)
predicted = set(pred.entities) if isinstance(pred.entities, list) else set()
if not predicted:
return 0
precision = len(expected & predicted) / len(predicted)
recall = len(expected & predicted) / len(expected)
return 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
# Optimize
optimizer = BootstrapFewShot(metric=entity_f1, max_labeled_demos=4)
extractor = EntityExtractor()
optimized = optimizer.compile(extractor, trainset=trainset)
# Use the optimized module
result = optimized(text="Tesla's Elon Musk unveiled new models in Austin, Texas.")
print(result.entities)
When DSPy Helps
Multi-step pipelines where each step depends on the previous one. A RAG system with query rewriting, retrieval, answer generation, and citation verification has four LLM steps. Prompt quality compounds across steps: errors in step 1 propagate through the pipeline. DSPy can jointly optimize all four prompts.
When you are switching between LLM providers. A pipeline optimized for GPT-4o degrades significantly on Mistral 7B. Re-running DSPy optimization with the new LLM takes hours and produces near-optimal prompts for that model. Manual prompt adaptation takes days.
When you have labeled data. DSPy requires training examples with ground truth labels to optimize against. If you have 50-500 labeled examples for your task, DSPy can use them. If you do not have labels, you need a different approach.
When DSPy Does Not Help
Simple single-step applications. If your application calls an LLM once to do a straightforward task, DSPy's overhead (learning the framework, setting up optimization runs, managing labeled datasets) rarely produces better results than a well-written manual prompt.
When you have no labeled data. DSPy requires labeled examples. If you do not have them, you either need to create them (expensive) or use a different optimization approach.
Real-time latency-sensitive applications. DSPy optimization runs take minutes to hours. The optimized program is fast to run, but the optimization process itself is not real-time.
When the task changes frequently. DSPy optimization produces prompts optimized for a fixed task. If your task definition changes often, re-running optimization frequently is impractical.
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace - chat, projects, time tracking, AI meeting summaries, and invoicing - in one tool. Try it free.
Practical deep-dives on LLMs, developer tools, and AI engineering. No filler. Unsubscribe any time.
// written byFIG. AUTH-01
530
Mahmudul Haque Qudrati
CEO & ML Engineer
CEO and ML Engineer at Pristren. Builds AI-powered software for teams and writes about machine learning, LLMs, developer tools, and practical AI applications.
OpenCode vs Claude Code: Open-Source Agentic CLI Compared
OpenCode runs Claude, GPT, Gemini, or local Ollama models in one terminal agent — Claude Code is official, polished, and Anthropic-native. Honest 2026 comparison.