What it's used for

DSPy is a framework that optimizes LLM prompts and pipelines programmatically instead of through manual prompt engineering. You define what you want (typed input/output signatures), and DSPy's optimizers automatically find the best prompts, few-shot examples, and configurations.

Key use cases include:

Automatic prompt optimization — replace hand-tuned prompts with data-driven optimization
Building modular LLM pipelines with typed, composable modules
Few-shot example selection — automatically choosing the best examples for in-context learning
Systematic model comparison — evaluate the same pipeline across different LLMs
Fine-tuning data generation — compile optimized prompts into fine-tuning datasets
Academic NLP research with reproducible LLM experiments

DSPy is used by ML engineers and researchers who want to move beyond trial-and-error prompt tweaking to systematic, reproducible LLM program optimization. It is developed at Stanford NLP and has gained strong adoption in research and production settings.

The key insight: treat LLM prompts as learnable parameters that can be optimized on a small dataset, just like neural network weights.

Getting started

Install DSPy:
```
pip install dspy
```

Configure your LLM:

import dspy
lm = dspy.LM('openai/gpt-4o', api_key='sk-...')
dspy.configure(lm=lm)

Define a module and optimize:

class QA(dspy.Signature):
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

qa = dspy.ChainOfThought(QA)

optimizer = dspy.BootstrapFewShotWithRandomSearch(metric=my_metric)
optimized_qa = optimizer.compile(qa, trainset=examples)

Read the DSPy documentation for advanced optimizers and multi-module pipelines.

Pricing: DSPy is free and open source (MIT). Optimization requires LLM API calls (cost depends on dataset size and number of optimization trials). Works with all major LLM providers.

Case studies

Real DSPy projects

Submitted by verified specialists

71% → 94% accuracy (automated) B2B SaaS

Self-Improving Prompt System — 71% → 94% Accuracy

B2B SaaS platform, customer feedback classification

› Challenge

A customer feedback classification system was stuck at 71% accuracy despite weeks of manual prompt iteration. Each improvement attempt required extensive evaluation and re-testing.

› Solution

Rebuilt the system using DSPy with MIPROv2 optimization. Defined the task as a DSPy module with typed inputs/outputs, then let the optimizer automatically discover effective prompts and few-shot examples using the evaluation dataset.

› Results

Accuracy improved from 71% to 94% over 6 weeks — entirely automatically. The DSPy program also adapts when new feedback categories appear without manual prompt rewriting. Team now ships classification improvements 5x faster.

Tools

DSPy OpenAI Deepeval

Hire an expert

61% → 83% (+22pts) Enterprise Software

Multi-Hop QA Pipeline — 61% → 83% Accuracy via Auto-Optimization

Enterprise knowledge management company

› Challenge

A multi-hop question-answering system requiring cross-document reasoning was stuck at 61% accuracy. Manual prompt optimization had plateaued — human intuition about what makes good multi-hop prompts was exhausted.

› Solution

Modeled the multi-hop reasoning as a DSPy chain-of-thought program with typed intermediate steps. Used MIPRO optimizer with a curated 500-question evaluation set to automatically discover the optimal reasoning patterns.

› Results

Accuracy improved from 61% to 83% — a 22-point gain with no manual prompt engineering. Inference cost also dropped 15% because optimized prompts are more concise. System now auto-improves when new training data arrives.

Tools

DSPy Hugging Face Deepeval

Hire an expert

Used DSPy professionally?

Add your case study and get discovered by clients.

Submit a case study