Table of Contents
Fetching ...

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment

Souradip Chakraborty, Sujay Bhatt, Udari Madhushani Sehwag, Soumya Suvra Ghosal, Jiahao Qiu, Mengdi Wang, Dinesh Manocha, Furong Huang, Alec Koppel, Sumitra Ganesh

TL;DR

Collab offers a training-free, test-time alignment approach for LLMs by mixing multiple pre-aligned agents and dynamically switching between them at the token level via an implicit Q-function-guided policy. It formalizes decoding as a KL-regularized, token-level MDP and extends single-agent KL control to a multi-agent setting with a principled objective $J^{\pi_j}_{\text{target}}$. The paper provides a sub-optimality bound that ties performance to reward differences and KL regularization, and demonstrates substantial empirical gains over single-agent decoding and prior SoTA methods across diverse tasks and datasets. By reusing off-the-shelf aligned policies and coordinating them during generation, Collab enables scalable, deployment-friendly test-time alignment without fine-tuning billions of parameters.

Abstract

Alignment of Large Language models (LLMs) is crucial for safe and trustworthy deployment in applications. Reinforcement learning from human feedback (RLHF) has emerged as an effective technique to align LLMs to human preferences and broader utilities, but it requires updating billions of model parameters, which is computationally expensive. Controlled Decoding, by contrast, provides a mechanism for aligning a model at inference time without retraining. However, single-agent decoding approaches often struggle to adapt to diverse tasks due to the complexity and variability inherent in these tasks. To strengthen the test-time performance w.r.t the target task, we propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies. Treating each prior policy as an agent in the spirit of mixture of agent collaboration, we develop a decoding method that allows for inference-time alignment through a token-level selection strategy among multiple agents. For each token, the most suitable LLM is dynamically chosen from a pool of models based on a long-term utility metric. This policy-switching mechanism ensures optimal model selection at each step, enabling efficient collaboration and alignment among LLMs during decoding. Theoretical analysis of our proposed algorithm establishes optimal performance with respect to the target task represented via a target reward for the given off-the-shelf models. We conduct comprehensive empirical evaluations with open-source aligned models on diverse tasks and preferences, which demonstrates the merits of this approach over single-agent decoding baselines. Notably, Collab surpasses the current SoTA decoding strategy, achieving an improvement of up to 1.56x in average reward and 71.89% in GPT-4 based win-tie rate.

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment

TL;DR

Collab offers a training-free, test-time alignment approach for LLMs by mixing multiple pre-aligned agents and dynamically switching between them at the token level via an implicit Q-function-guided policy. It formalizes decoding as a KL-regularized, token-level MDP and extends single-agent KL control to a multi-agent setting with a principled objective . The paper provides a sub-optimality bound that ties performance to reward differences and KL regularization, and demonstrates substantial empirical gains over single-agent decoding and prior SoTA methods across diverse tasks and datasets. By reusing off-the-shelf aligned policies and coordinating them during generation, Collab enables scalable, deployment-friendly test-time alignment without fine-tuning billions of parameters.

Abstract

Alignment of Large Language models (LLMs) is crucial for safe and trustworthy deployment in applications. Reinforcement learning from human feedback (RLHF) has emerged as an effective technique to align LLMs to human preferences and broader utilities, but it requires updating billions of model parameters, which is computationally expensive. Controlled Decoding, by contrast, provides a mechanism for aligning a model at inference time without retraining. However, single-agent decoding approaches often struggle to adapt to diverse tasks due to the complexity and variability inherent in these tasks. To strengthen the test-time performance w.r.t the target task, we propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies. Treating each prior policy as an agent in the spirit of mixture of agent collaboration, we develop a decoding method that allows for inference-time alignment through a token-level selection strategy among multiple agents. For each token, the most suitable LLM is dynamically chosen from a pool of models based on a long-term utility metric. This policy-switching mechanism ensures optimal model selection at each step, enabling efficient collaboration and alignment among LLMs during decoding. Theoretical analysis of our proposed algorithm establishes optimal performance with respect to the target task represented via a target reward for the given off-the-shelf models. We conduct comprehensive empirical evaluations with open-source aligned models on diverse tasks and preferences, which demonstrates the merits of this approach over single-agent decoding baselines. Notably, Collab surpasses the current SoTA decoding strategy, achieving an improvement of up to 1.56x in average reward and 71.89% in GPT-4 based win-tie rate.

Paper Structure

This paper contains 21 sections, 2 theorems, 25 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Let $\Pi = \{\pi_1, \pi_2, \dots, \pi_K\}$ be a set of pre-trained policies, each aligned to a latent reward function $r_j$, and $\pi_{\text{alg}}$ be the policy obtained by the multi-agent decoding strategy. Assume that the optimal policy for the target reward function $r^*$ is $\pi^*$. Then, the s where $\delta_{*j} = \max_{\tau} |r_{\text{target}}([\mathbf{s}_t,z], \tau) - r_j([\mathbf{s}_t,z],

Figures (4)

  • Figure 1: The figure illustrates the optimal coordination between agents for response generation via switching, where Agent1 is a ChatAgent and Agent2 is a Chemical-Expert. In this collaborative response, the agents are switching smoothly at the word and phrase level to deliver a more detailed and complete response than they could individually. The switching demonstrates how both agents complement each other in explaining the complex process.
  • Figure 2: In the above plots, we present the normalized average reward values obtained using the corresponding setup outlined in Table \ref{['tab:setup_indv']}. Agent-I, and Agent-II refers to the average reward obtained by the individual models with SoTA decoding. For the BoN agents sampling, we perform vanilla logit-based sampling using individual agents and select the best response w.r.t the target reward. Our analysis reveals that across all setups, Collab consistently outperforms other baselines summarized in Table \ref{['tab:setup_indv']}, demonstrating the importance of multi-agent decoding.
  • Figure 3: In the above plots, we present the diversity and coherence values obtained using the corresponding setup as outlined in Table \ref{['tab:setup_indv']}. We clearly observe the response generated using Collab consistently outperforms other baselines in-terms of both diversity and coherence. This also indicates switching between agents with Implicit-Q helps in improving the overall quality of the responses
  • Figure 4: Left. The bar plot highlights the importance of using diverse agents for enhanced decoding; employing different but non-diverse agents results in poor performance. Right. The visualization shows the improvement in average reward as the number of diverse agents increases.

Theorems & Definitions (2)

  • Theorem 1: Sub-optimality Bound of Multi-Agent Decoding Algorithm
  • Lemma 1