DSPy is a framework that optimizes LLM prompts and pipelines programmatically instead of through manual prompt engineering. You define what you want (typed input/output signatures), and DSPy's optimizers automatically find the best prompts, few-shot examples, and configurations.
Key use cases include:
DSPy is used by ML engineers and researchers who want to move beyond trial-and-error prompt tweaking to systematic, reproducible LLM program optimization. It is developed at Stanford NLP and has gained strong adoption in research and production settings.
The key insight: treat LLM prompts as learnable parameters that can be optimized on a small dataset, just like neural network weights.
pip install dspyimport dspy
lm = dspy.LM('openai/gpt-4o', api_key='sk-...')
dspy.configure(lm=lm)class QA(dspy.Signature):
question: str = dspy.InputField()
answer: str = dspy.OutputField()
qa = dspy.ChainOfThought(QA)
optimizer = dspy.BootstrapFewShotWithRandomSearch(metric=my_metric)
optimized_qa = optimizer.compile(qa, trainset=examples)Pricing: DSPy is free and open source (MIT). Optimization requires LLM API calls (cost depends on dataset size and number of optimization trials). Works with all major LLM providers.
Case studies
B2B SaaS platform, customer feedback classification
A customer feedback classification system was stuck at 71% accuracy despite weeks of manual prompt iteration. Each improvement attempt required extensive evaluation and re-testing.
Rebuilt the system using DSPy with MIPROv2 optimization. Defined the task as a DSPy module with typed inputs/outputs, then let the optimizer automatically discover effective prompts and few-shot examples using the evaluation dataset.
Accuracy improved from 71% to 94% over 6 weeks — entirely automatically. The DSPy program also adapts when new feedback categories appear without manual prompt rewriting. Team now ships classification improvements 5x faster.
Enterprise knowledge management company
A multi-hop question-answering system requiring cross-document reasoning was stuck at 61% accuracy. Manual prompt optimization had plateaued — human intuition about what makes good multi-hop prompts was exhausted.
Modeled the multi-hop reasoning as a DSPy chain-of-thought program with typed intermediate steps. Used MIPRO optimizer with a curated 500-question evaluation set to automatically discover the optimal reasoning patterns.
Accuracy improved from 61% to 83% — a 22-point gain with no manual prompt engineering. Inference cost also dropped 15% because optimized prompts are more concise. System now auto-improves when new training data arrives.
Thought leaders
Follow for insights, tutorials, and thought leadership
Stanford NLP / Databricks
Creator of DSPy, the framework for programming—not prompting—language models. PhD from Stanford NLP, advised by Matei Zaharia and Christopher Potts. Research Scientist at Databricks. DSPy has been adopted by both ML engineers and non-ML software engineers for building modular AI systems with automatic prompt optimization.
Databricks
Co-founder and CTO of Databricks and professor at UC Berkeley. Created Apache Spark and MLflow, two of the most influential open-source projects in data engineering and MLOps. MLflow has millions of downloads.
Submit a brief and we'll match you with vetted specialists who have proven DSPy experience.