PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
Sarat Chandra Bobbili, Ujwal Dinesha, Dheeraj Narasimha, Srinivas Shakkottai
TL;DR
PITA tackles the problem of aligning LLM outputs with user preferences without fine-tuning or pre-trained reward models. It introduces a preference-guided inference-time framework that learns a small guidance policy from user preferences and uses it to modulate the LLM's next-token distribution at decoding time, while keeping the base model frozen. Theoretical results show sub-linear regret under a linear preference model, and experiments across GSM8K, star-graph reasoning, and IMDB sentiment generation demonstrate competitive performance to reward-based methods, with robustness to various task types. The approach reduces computational cost and dependency on reward-model training, offering a data-efficient and practical alternative for LLM alignment in real-world applications.
Abstract
Inference-time alignment enables large language models (LLMs) to generate outputs aligned with end-user preferences without further training. Recent post-training methods achieve this by using small guidance models to modify token generation during inference. These methods typically optimize a reward function KL-regularized by the original LLM taken as the reference policy. A critical limitation, however, is their dependence on a pre-trained reward model, which requires fitting to human preference feedback--a potentially unstable process. In contrast, we introduce PITA, a novel framework that integrates preference feedback directly into the LLM's token generation, eliminating the need for a reward model. PITA learns a small preference-based guidance policy to modify token probabilities at inference time without LLM fine-tuning, reducing computational cost and bypassing the pre-trained reward model dependency. The problem is framed as identifying an underlying preference distribution, solved through stochastic search and iterative refinement of the preference-based guidance model. We evaluate PITA across diverse tasks, including mathematical reasoning and sentiment classification, demonstrating its effectiveness in aligning LLM outputs with user preferences.
