DSPy

DSPy

LLM program optimization

2 case studies
2 specialists
General Dev Framework

What it's used for

DSPy is used to optimize LLM prompts and pipelines programmatically instead of manually tweaking prompt text. You define your pipeline as composable modules with typed signatures, then DSPy's optimizers automatically find the best prompts, few-shot examples, and fine-tuning data to maximize a metric you define.

Getting started

Install with `pip install dspy-ai` and configure your LLM with `dspy.OpenAI(model='gpt-4', api_key=...)` or another provider. Define a Signature describing inputs and outputs, wrap it in a Module, then use an optimizer like BootstrapFewShot with a small labeled dataset. The optimizer compiles your program into an optimized version.

$ pip install dspy-ai` and configure your LLM with `dspy

Case studies

Real DSPy projects

71% → 94% accuracy (automated) B2B SaaS

Self-Improving Prompt System — 71% → 94% Accuracy

B2B SaaS platform, customer feedback classification

Challenge

A customer feedback classification system was stuck at 71% accuracy despite weeks of manual prompt iteration. Each improvement attempt required extensive evaluation and re-testing.

Solution

Rebuilt the system using DSPy with MIPROv2 optimization. Defined the task as a DSPy module with typed inputs/outputs, then let the optimizer automatically discover effective prompts and few-shot examples using the evaluation dataset.

Results

Accuracy improved from 71% to 94% over 6 weeks — entirely automatically. The DSPy program also adapts when new feedback categories appear without manual prompt rewriting. Team now ships classification improvements 5x faster.

61% → 83% (+22pts) Enterprise Software

Multi-Hop QA Pipeline — 61% → 83% Accuracy via Auto-Optimization

Enterprise knowledge management company

Challenge

A multi-hop question-answering system requiring cross-document reasoning was stuck at 61% accuracy. Manual prompt optimization had plateaued — human intuition about what makes good multi-hop prompts was exhausted.

Solution

Modeled the multi-hop reasoning as a DSPy chain-of-thought program with typed intermediate steps. Used MIPRO optimizer with a curated 500-question evaluation set to automatically discover the optimal reasoning patterns.

Results

Accuracy improved from 61% to 83% — a 22-point gain with no manual prompt engineering. Inference cost also dropped 15% because optimized prompts are more concise. System now auto-improves when new training data arrives.

Used DSPy professionally?

Add your case study and get discovered by clients.

Submit a case study

Thought leaders

AI leaders using DSPy

Follow for insights, tutorials, and thought leadership

Related tools in General

Need a DSPy expert?

Submit a brief and we'll match you with vetted specialists who have proven DSPy experience.

Submit a brief — it's free