Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs
Alexander K. Lew, Tan Zhi-Xuan, Gabriel Grand, Vikash K. Mansinghka
TL;DR
This paper tackles the challenge of reliably constraining LLM outputs at inference time beyond prompting and fine-tuning. It proposes sequential Monte Carlo steering, recasting generation as posterior inference in Feynman-Kac Transformer models and replacing standard decoding with particle-based SMC. The key contributions are (i) the Feynman-Kac formulation for constrained generation, (ii) the SMC steering algorithm with shared Transformer caching and a without-replacement resampling strategy, and (iii) the LLaMPPL library for building language-model probabilistic programs and automating steering. The approach achieves comparable computational cost to beam search while enabling sampling from constrained posteriors and supports tasks such as hard constraints, infilling, and prompt intersection, with improved sample quality through better proposals. The work thus provides a scalable framework to control LLM outputs with probabilistic guarantees and modular task composition.
Abstract
Even after fine-tuning and reinforcement learning, large language models (LLMs) can be difficult, if not impossible, to control reliably with prompts alone. We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of LLMs, called sequential Monte Carlo (SMC) steering. The key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models, and replace standard decoding with sequential Monte Carlo inference. For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks, including infilling, generation under syntactic constraints, and prompt intersection. To facilitate experimentation with SMC steering, we present a probabilistic programming library, LLaMPPL (https://github.com/probcomp/hfppl), for concisely specifying new generation tasks as language model probabilistic programs, and automating steering of LLaMA-family Transformers.
