Table of Contents
Fetching ...

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

Arnav Singhvi, Manish Shetty, Shangyin Tan, Christopher Potts, Koushik Sen, Matei Zaharia, Omar Khattab

TL;DR

LMs often produce outputs that violate application constraints; the paper introduces LM Assertions as a programming primitive to enforce computational constraints in LM pipelines. By integrating LM Assertions into DSPy, the authors demonstrate three assertion-driven optimizations—backtracking, example bootstrapping, and counterexample bootstrapping—that enable self-refinement and principled prompt optimization. Across four knowledge-intensive tasks derived from HotPotQA, LM Assertions improve constraint compliance and downstream performance, with substantial gains in both intrinsic and extrinsic metrics. The work provides a practical, extensible framework for building more reliable, self-correcting LM pipelines.

Abstract

Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic "prompt engineering". We introduce LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems. We also propose strategies to use assertions at inference time for automatic self-refinement with LMs. We report on four diverse case studies for text generation and find that LM Assertions improve not only compliance with imposed rules but also downstream task performance, passing constraints up to 164% more often and generating up to 37% more higher-quality responses. Our reference implementation of LM Assertions is integrated into DSPy at https://github.com/stanfordnlp/dspy

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

TL;DR

LMs often produce outputs that violate application constraints; the paper introduces LM Assertions as a programming primitive to enforce computational constraints in LM pipelines. By integrating LM Assertions into DSPy, the authors demonstrate three assertion-driven optimizations—backtracking, example bootstrapping, and counterexample bootstrapping—that enable self-refinement and principled prompt optimization. Across four knowledge-intensive tasks derived from HotPotQA, LM Assertions improve constraint compliance and downstream performance, with substantial gains in both intrinsic and extrinsic metrics. The work provides a practical, extensible framework for building more reliable, self-correcting LM pipelines.

Abstract

Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic "prompt engineering". We introduce LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems. We also propose strategies to use assertions at inference time for automatic self-refinement with LMs. We report on four diverse case studies for text generation and find that LM Assertions improve not only compliance with imposed rules but also downstream task performance, passing constraints up to 164% more often and generating up to 37% more higher-quality responses. Our reference implementation of LM Assertions is integrated into DSPy at https://github.com/stanfordnlp/dspy
Paper Structure (45 sections, 2 equations, 6 figures, 1 table)

This paper contains 45 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: DSPy program with LM Assertions for multi-hop question-answering task with a retriever. We introduce two soft assertions (suggestions): (1) query to retriever should be less than 100 characters; (2) query to retriever should differ from previous queries. For instance, if the second suggestion fails, DSPy will construct a new prompt to retry the module with additional fields, highlighting the previously generated query and a user-defined error message to help the LM refine its generation.
  • Figure 2: Evaluation of each task on the validation set (Dev) and the test set (Test). Tasks are described in \ref{['sec:tasks']}, and LM pipeline configuration are described in \ref{['tab:exp-configs']}. For each task, we use the same LM pipeline program except for the LM Assertions. Extrinsic metrics (downstream application performance) are highlighted in depth .9grey. For each metric, higher is always better. The highest value in each column is in bold.
  • Figure 3: DSPy program with LM Assertions for long-form paragraph multi-hop question answering task with a retriever. We introduce two suggestions: (1) asserting every 1-2 sentences has a citation; (2) every text segment preceding a citation is faithful to its cited reference.
  • Figure 4: DSPy program with LM Assertions for quiz question choice generation. We introduce 3 suggestions: (1) asserting JSON format; (2) correct answer is included; (3) plausible distractor choices are present.
  • Figure 5: DSPy program with LM Assertions for tweet generation. We introduce 5 suggestions: (1) asserting no hashtags; (2) correct answer is included; (3) tweet is within character limit; (4) tweet is engaging; (5) tweet is faithful to context.
  • ...and 1 more figures