DSPy

DSPy

LLM program optimization

General Dev Framework

What it's used for

DSPy is a framework that optimizes LLM prompts and pipelines programmatically instead of through manual prompt engineering. You define what you want (typed input/output signatures), and DSPy's optimizers automatically find the best prompts, few-shot examples, and configurations.

Key use cases include:

  • Automatic prompt optimization — replace hand-tuned prompts with data-driven optimization
  • Building modular LLM pipelines with typed, composable modules
  • Few-shot example selection — automatically choosing the best examples for in-context learning
  • Systematic model comparison — evaluate the same pipeline across different LLMs
  • Fine-tuning data generation — compile optimized prompts into fine-tuning datasets
  • Academic NLP research with reproducible LLM experiments

DSPy is used by ML engineers and researchers who want to move beyond trial-and-error prompt tweaking to systematic, reproducible LLM program optimization. It is developed at Stanford NLP and has gained strong adoption in research and production settings.

The key insight: treat LLM prompts as learnable parameters that can be optimized on a small dataset, just like neural network weights.

Getting started

  1. Install DSPy:
    pip install dspy
  2. Configure your LLM:
    import dspy
    lm = dspy.LM('openai/gpt-4o', api_key='sk-...')
    dspy.configure(lm=lm)
  3. Define a module and optimize:
    class QA(dspy.Signature):
        question: str = dspy.InputField()
        answer: str = dspy.OutputField()
    
    qa = dspy.ChainOfThought(QA)
    
    optimizer = dspy.BootstrapFewShotWithRandomSearch(metric=my_metric)
    optimized_qa = optimizer.compile(qa, trainset=examples)
  4. Read the DSPy documentation for advanced optimizers and multi-module pipelines.

Pricing: DSPy is free and open source (MIT). Optimization requires LLM API calls (cost depends on dataset size and number of optimization trials). Works with all major LLM providers.

Case studies

Real DSPy projects

71% → 94% accuracy (automated) B2B SaaS

Self-Improving Prompt System — 71% → 94% Accuracy

B2B SaaS platform, customer feedback classification

Challenge

A customer feedback classification system was stuck at 71% accuracy despite weeks of manual prompt iteration. Each improvement attempt required extensive evaluation and re-testing.

Solution

Rebuilt the system using DSPy with MIPROv2 optimization. Defined the task as a DSPy module with typed inputs/outputs, then let the optimizer automatically discover effective prompts and few-shot examples using the evaluation dataset.

Results

Accuracy improved from 71% to 94% over 6 weeks — entirely automatically. The DSPy program also adapts when new feedback categories appear without manual prompt rewriting. Team now ships classification improvements 5x faster.

61% → 83% (+22pts) Enterprise Software

Multi-Hop QA Pipeline — 61% → 83% Accuracy via Auto-Optimization

Enterprise knowledge management company

Challenge

A multi-hop question-answering system requiring cross-document reasoning was stuck at 61% accuracy. Manual prompt optimization had plateaued — human intuition about what makes good multi-hop prompts was exhausted.

Solution

Modeled the multi-hop reasoning as a DSPy chain-of-thought program with typed intermediate steps. Used MIPRO optimizer with a curated 500-question evaluation set to automatically discover the optimal reasoning patterns.

Results

Accuracy improved from 61% to 83% — a 22-point gain with no manual prompt engineering. Inference cost also dropped 15% because optimized prompts are more concise. System now auto-improves when new training data arrives.

Used DSPy professionally?

Add your case study and get discovered by clients.

Submit a case study

Thought leaders

AI leaders using DSPy

Follow for insights, tutorials, and thought leadership

Related tools in General

Need a DSPy expert?

Submit a brief and we'll match you with vetted specialists who have proven DSPy experience.

Submit a brief — it's free