Robust Multi-Objective Controlled Decoding of Large Language Models
Seongho Son, William Bankes, Sangwoong Yoon, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic
TL;DR
This work tackles test-time multi-objective alignment for large language models by addressing unknown relative importance across objectives. It introduces Robust Multi-Objective Decoding (RMOD), a max-min inference framework that computes worst-case objective weights via a convex optimization and derives a best-response policy analytically, all while regularizing toward the reference model with KL divergence. A practical RMOD variant, including block-wise decoding and weight-approximation techniques, enables low-latency deployment on contemporary LLMs. Empirical results on HH, UltraFeedback, and ValuePrism demonstrate robust, balanced alignment with significant improvements in worst-case rewards compared to baselines, and show favorable latency profiles. The approach offers a principled, inference-time solution to equitable multi-objective control in user- and task-specific contexts, with avenues for refinement in reward design and value-function accuracy.
Abstract
Test-time alignment of Large Language Models (LLMs) to human preferences offers a flexible way to generate responses aligned to diverse objectives without extensive retraining of LLMs. Existing methods achieve alignment to multiple objectives simultaneously (e.g., instruction-following, helpfulness, conciseness) by optimizing their corresponding reward functions. However, they often rely on predefined weights or optimize for averages, sacrificing one objective for another and leading to unbalanced outcomes. To address this, we introduce Robust Multi-Objective Decoding (RMOD), a novel inference-time algorithm that optimizes for improving worst-case rewards. RMOD formalizes the robust decoding problem as a maximin two-player game between reward weights and the sampling policy, solving for the Nash equilibrium. We show that the game reduces to a convex optimization problem to find the worst-case weights, while the best response policy can be computed analytically. We also introduce a practical RMOD variant designed for efficient decoding with contemporary LLMs, incurring minimal computational overhead compared to non-robust Multi-Objective Decoding (MOD) methods. Our experimental results showcase the effectiveness of RMOD in generating responses equitably aligned with diverse objectives, outperforming baselines up to 20%.
