Table of Contents
Fetching ...

Prompting Is Programming: A Query Language for Large Language Models

Luca Beurer-Kellner, Marc Fischer, Martin Vechev

TL;DR

<3-5 sentence high-level summary> LMQL introduces Language Model Programming (LMP) to extend prompting with scripting and output constraints, enabling front-end prompts to be paired with back-end decoding. The LMQL runtime supports eager and partial evaluation, token masking, and scripted beam search, leading to model-agnostic, constraint-driven decoding that can greatly reduce inference costs while preserving or improving accuracy. Through case studies on chain-of-thought prompting, interactive prompting, and arithmetic reasoning, the approach demonstrates up to 80% savings in billable tokens and decoder calls. This work provides a programmable, cross-model interface for prompt engineering and highlights a path toward standardizing LM querying across vendors.

Abstract

Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction. Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks while abstracting language model internals and providing high-level semantics. To enable LMP, we implement LMQL(short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model. We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (26-85% cost savings).

Prompting Is Programming: A Query Language for Large Language Models

TL;DR

<3-5 sentence high-level summary> LMQL introduces Language Model Programming (LMP) to extend prompting with scripting and output constraints, enabling front-end prompts to be paired with back-end decoding. The LMQL runtime supports eager and partial evaluation, token masking, and scripted beam search, leading to model-agnostic, constraint-driven decoding that can greatly reduce inference costs while preserving or improving accuracy. Through case studies on chain-of-thought prompting, interactive prompting, and arithmetic reasoning, the approach demonstrates up to 80% savings in billable tokens and decoder calls. This work provides a programmable, cross-model interface for prompt engineering and highlights a path toward standardizing LM querying across vendors.

Abstract

Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction. Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks while abstracting language model internals and providing high-level semantics. To enable LMP, we implement LMQL(short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model. We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (26-85% cost savings).
Paper Structure (72 sections, 1 theorem, 3 equations, 17 figures, 5 tables, 3 algorithms)

This paper contains 72 sections, 1 theorem, 3 equations, 17 figures, 5 tables, 3 algorithms.

Key Result

theorem 1

(Brzozowski Soundness) Given a query $\mathcal{Q}$, partial interaction trace $u$, and the corresponding set of allowed tokens $M := \{t \in \mathcal{V} \;|\; \text{\scshape{Follow}}[\text{where}_\mathcal{Q}](u, t) \neq \text{fin}{}(\bot)\}$, it holds that $T_\mathcal{Q} \subseteq M$, where $T_\math

Figures (17)

  • Figure 1: Two LMQL programs that demonstrate core features like scripted prompting, eager output constraining and validation, and prompting with control flow.
  • Figure 2: Tokenization of a sentence.
  • Figure 3: Example of few-shot prompting; originally presented in BrownMRSKDNSSAA20.
  • Figure 4: Example of a meta prompt for the circumference of the earth and its scripted prompting counterpart.
  • Figure 5: Syntax of LMQL. Brackets denote optional elements. Syntax is generally python based.
  • ...and 12 more figures

Theorems & Definitions (1)

  • theorem 1