Table of Contents
Fetching ...

Constrained Adaptive Rejection Sampling

Paweł Parys, Sairam Vaidya, Taylor Berg-Kirkpatrick, Loris D'Antoni

TL;DR

Constrained Adaptive Rejection Sampling (CARS) introduces an exact, amortized-efficient method for sampling from a Language Model under hard constraints. By recording invalid prefixes in a trie and adaptively reshaping the sampling distribution with the distribution $R^\mathcal{W}$, CARS preserves the LM’s true constrained distribution while dramatically reducing wasted samples. Across grammar-based fuzzing, molecular generation, and PDDL planning, CARS achieves higher acceptance and diversity, lower KL divergence to the true distribution, and favorable downstream performance (e.g., code-coverage, synthesis feasibility) compared to both exact and approximate baselines. This approach offers a principled, scalable solution for constraint-compliant generation in domains requiring both fidelity and sample efficiency.

Abstract

Language Models (LMs) are increasingly used in applications where generated outputs must satisfy strict semantic or syntactic constraints. Existing approaches to constrained generation fall along a spectrum: greedy constrained decoding methods enforce validity during decoding but distort the LM's distribution, while rejection sampling (RS) preserves fidelity but wastes computation by discarding invalid outputs. Both extremes are problematic in domains such as program fuzzing, where both validity and diversity of samples are essential. We present Constrained Adaptive Rejection Sampling (CARS), an approach that strictly improves the sample-efficiency of RS without distributional distortion. CARS begins with unconstrained LM sampling and adaptively rules out constraint-violating continuations by recording them in a trie and subtracting their probability mass from future draws. This adaptive pruning ensures that prefixes proven invalid are never revisited, acceptance rates improve monotonically, and the resulting samples exactly follow the constrained distribution. In experiments on a variety of domains -- e.g., program fuzzing and molecular generation -- CARS consistently achieves higher efficiency -- measured in the number of LM forward passes per valid sample -- while also producing stronger sample diversity than both GCD and methods that approximate the LM's distribution.

Constrained Adaptive Rejection Sampling

TL;DR

Constrained Adaptive Rejection Sampling (CARS) introduces an exact, amortized-efficient method for sampling from a Language Model under hard constraints. By recording invalid prefixes in a trie and adaptively reshaping the sampling distribution with the distribution , CARS preserves the LM’s true constrained distribution while dramatically reducing wasted samples. Across grammar-based fuzzing, molecular generation, and PDDL planning, CARS achieves higher acceptance and diversity, lower KL divergence to the true distribution, and favorable downstream performance (e.g., code-coverage, synthesis feasibility) compared to both exact and approximate baselines. This approach offers a principled, scalable solution for constraint-compliant generation in domains requiring both fidelity and sample efficiency.

Abstract

Language Models (LMs) are increasingly used in applications where generated outputs must satisfy strict semantic or syntactic constraints. Existing approaches to constrained generation fall along a spectrum: greedy constrained decoding methods enforce validity during decoding but distort the LM's distribution, while rejection sampling (RS) preserves fidelity but wastes computation by discarding invalid outputs. Both extremes are problematic in domains such as program fuzzing, where both validity and diversity of samples are essential. We present Constrained Adaptive Rejection Sampling (CARS), an approach that strictly improves the sample-efficiency of RS without distributional distortion. CARS begins with unconstrained LM sampling and adaptively rules out constraint-violating continuations by recording them in a trie and subtracting their probability mass from future draws. This adaptive pruning ensures that prefixes proven invalid are never revisited, acceptance rates improve monotonically, and the resulting samples exactly follow the constrained distribution. In experiments on a variety of domains -- e.g., program fuzzing and molecular generation -- CARS consistently achieves higher efficiency -- measured in the number of LM forward passes per valid sample -- while also producing stronger sample diversity than both GCD and methods that approximate the LM's distribution.

Paper Structure

This paper contains 75 sections, 1 theorem, 5 equations, 15 figures, 14 tables, 1 algorithm.

Key Result

Theorem 1

The CARS algorithm samples an element of $\mathcal{L}$ according to the target distribution $P^\mathcal{L}$. Moreover, the adaptive updates performed in Line 6 of the algorithm monotonically increase the probability that some sequence is yielded in Line 5 at subsequent loop iterations.

Figures (15)

  • Figure 1: Invalid sample 0++ for the arithmetic grammar in \ref{['ex:arith']}. The sequence ending in the blue token is invalid for both ARS and CARS, whereas the sequences ending with orange tokens are only considered invalid by CARS. With the example probabilities in parenthesis, ARS reduces the future rejection probability by $0.09\approx 0.45*0.45*0.45$ whereas CARS reduces it by $0.63\approx 0.3+0.45*0.55+0.45*0.45*0.45$.
  • Figure 2: XML benchmark with grammar: (a) KL divergence for different sampling methods. (b) Branch coverage achieved by fuzzing with generated seeds. Displayed KL for RSFT and CARS is non-zero (even though these methods are exact) because we compute an empirical estimate of KL. The vertical dashed line is the average number of steps MCMC would require to have the same sample efficiency as CARS (i.e., CARS averages 2.25 LM calls per sample.)
  • Figure 3: (a) Prompt given to a LM to generate seed test cases for fuzzing the XML parser. (b) Simplified version of the XML grammar written in Lark notation. The goal of the problem is to generate multiple diverse seeds that trigger different code paths in the library being tested.
  • Figure 4: KL divergence comparison across fuzzing benchmarks (without grammar condition). CARS and RSFT show consistently lower divergence than approximate methods, confirming distributional fidelity while MCMC shows convergence behavior over steps.
  • Figure 5: Parse tree for ethyl acrylate (C=CC(=O)OCC). The tree shows how grammar constraints enforce the acrylate functional group (highlighted) while permitting variation in alkyl substituents. Purple nodes represent non-terminals, and blue italics text displays the actual SMILES tokens.
  • ...and 10 more figures

Theorems & Definitions (3)

  • Example 1: Arithmetic Expressions
  • Theorem 1
  • proof