Table of Contents
Fetching ...

Collaborate, Deliberate, Evaluate: How LLM Alignment Affects Coordinated Multi-Agent Outcomes

Abhijnan Nath, Carine Graff, Nikhil Krishnaswamy

TL;DR

This work investigates how LLM alignment methods influence coordinated multi-agent outcomes by modeling interventions as friction in a Modified-Action MDP ($\mathcal{M}_f$) and evaluating them via roleplay on two tasks: the DeliData Wason Card Task and the Weights Task. The authors introduce FAAF, a friction-aware alignment approach, and show it outperforms traditional baselines (SFT, DPO, IPO, PPO, BC) in terms of common-ground convergence and task accuracy, especially under action-modification conditions where collaborators can reinterpret or ignore interventions. Through data generation, human validation, and an extensive roleplay evaluation, they demonstrate that incorporating friction and explicit modeling of action modification yields more robust, deliberative collaboration and better task outcomes. The results highlight the importance of evaluating alignment in realistic, long-horizon, multi-agent settings and suggest friction-enabled strategies can improve reliability and accountability in human-AI collaboration. These findings have practical implications for deploying AI collaborators in complex teamwork, with potential impact on collaborative reasoning, decision support, and AI governance.

Abstract

As Large Language Models (LLMs) get integrated into diverse workflows, they are increasingly being regarded as "collaborators" with humans, and required to work in coordination with other AI systems. If such AI collaborators are to reliably coordinate their actions and behaviors with humans or other AIs, their properties and behaviors over multi-turn interactions must be known and predictable. This paper examines how different alignment methods affect LLM agents' effectiveness as partners in multi-turn, multi-party collaborations. We study this question through the lens of intervention agents that insert themselves into group dialogues not to provide answers, but to encourage the collaborative group to slow down and reflect upon their reasoning for deliberative decision-making. Common alignment techniques are typically developed under simplified single-user settings and assume the optimality of the underlying token MDP. Using the theoretical lens of the modified-action MDP, we show how they do not account for the dynamics of long-horizon multi-party interactions. We present a novel roleplay simulation methodology, where we align LLMs according to different methods and then deploy them in collaborative task dialogues to quantify how interventions affect the trajectory of group collaboration, belief alignment, and coordination. Our results show that an intervention agent that is robust to action modification significantly outperforms common alignment baselines in supporting correct task outcomes.

Collaborate, Deliberate, Evaluate: How LLM Alignment Affects Coordinated Multi-Agent Outcomes

TL;DR

This work investigates how LLM alignment methods influence coordinated multi-agent outcomes by modeling interventions as friction in a Modified-Action MDP () and evaluating them via roleplay on two tasks: the DeliData Wason Card Task and the Weights Task. The authors introduce FAAF, a friction-aware alignment approach, and show it outperforms traditional baselines (SFT, DPO, IPO, PPO, BC) in terms of common-ground convergence and task accuracy, especially under action-modification conditions where collaborators can reinterpret or ignore interventions. Through data generation, human validation, and an extensive roleplay evaluation, they demonstrate that incorporating friction and explicit modeling of action modification yields more robust, deliberative collaboration and better task outcomes. The results highlight the importance of evaluating alignment in realistic, long-horizon, multi-agent settings and suggest friction-enabled strategies can improve reliability and accountability in human-AI collaboration. These findings have practical implications for deploying AI collaborators in complex teamwork, with potential impact on collaborative reasoning, decision support, and AI governance.

Abstract

As Large Language Models (LLMs) get integrated into diverse workflows, they are increasingly being regarded as "collaborators" with humans, and required to work in coordination with other AI systems. If such AI collaborators are to reliably coordinate their actions and behaviors with humans or other AIs, their properties and behaviors over multi-turn interactions must be known and predictable. This paper examines how different alignment methods affect LLM agents' effectiveness as partners in multi-turn, multi-party collaborations. We study this question through the lens of intervention agents that insert themselves into group dialogues not to provide answers, but to encourage the collaborative group to slow down and reflect upon their reasoning for deliberative decision-making. Common alignment techniques are typically developed under simplified single-user settings and assume the optimality of the underlying token MDP. Using the theoretical lens of the modified-action MDP, we show how they do not account for the dynamics of long-horizon multi-party interactions. We present a novel roleplay simulation methodology, where we align LLMs according to different methods and then deploy them in collaborative task dialogues to quantify how interventions affect the trajectory of group collaboration, belief alignment, and coordination. Our results show that an intervention agent that is robust to action modification significantly outperforms common alignment baselines in supporting correct task outcomes.

Paper Structure

This paper contains 36 sections, 9 theorems, 11 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $\Psi: [0,1] \rightarrow \mathbb{R}$ be any non-decreasing function and $\beta > 0$ be a temperature parameter. Let $P_A(a|s,\pi^I) = \sum_{a' \in A} \pi^I(a'|s) \cdot \pi^C(a|s,a')$, and represent modifications to the probability distribution over the action space by a collaborator policy $\pi^ where $Q^I$ satisfies the Bellman optimality equation for the underlying MDP $\mathcal{M}$. Thus $\

Figures (8)

  • Figure 1: High-level overview of our agent roleplay and evaluation framework. Collaborate [L]: collaborator agents collaborate to complete tasks with an intervention agent in the loop to redirect the dialogue toward reflective reasoning rather than naive acceptance of assertions. Deliberate [C]: Sample collaborative roleplay from DeliData Wason Card task karadzhov2023delidata with successful task completion, and "frictive state" description at top. Evaluate [R]: Common ground convergence and task outcomes with interventions provided by differently-aligned agents.
  • Figure 2: Oracle Friction Agent ($\mathcal{O}$) roleplay prompt.
  • Figure 3: Collaborator Agent ($\pi^C$) Final Turn Prompt for resolving the card selection task, incorporating friction agent input and structured output fields for participant reasoning, final submission, and decision process.
  • Figure 4: Collaborator agent ($\pi^C$) final-turn prompt used to elicit the group’s conclusive decision in the Wason Card Selection task. This turn does not apply the MAMDP instruction; the purple MAMDP line used in intermediate turns is intentionally omitted here. See Table \ref{['tab:wtd_combined_results']} for results and \ref{['fig:wason_unified_prompt_items']} for the unified turn-level prompt used earlier in the dialogue.
  • Figure 5: Collaborator agent ($\pi^C$) continuation prompt for the Wason Card Selection task. In the Standard setting (turns $N{=}1$–$9$), the purple instruction is omitted. In the MAMDP setting, the purple line is included verbatim while all other content remains unchanged. The final submission at $N{=}10$ uses a separate prompt (see \ref{['fig:deli_final_turnlevel_prompt']}). At turn $N{=}1$, we prepend the bootstrap dialogue from the original human conversations to [Current Dialogue].
  • ...and 3 more figures

Theorems & Definitions (10)

  • Example 1: Action Modification in DeliData Wason Card Task
  • Theorem 1: $\Psi$-Preference Optimization in Collaborative MAMDPs
  • Lemma 1: Vanishing Gradient of the Frictive State
  • Lemma 2: Token-Level IPO Equivalence
  • Lemma 3: Token-to-Intervention Bellman Completeness
  • Theorem 2: $\Psi$-Preference Optimization in Collaborative MAMDPs
  • Proposition 1: DPO Bellman Optimality in MAMDPs
  • Lemma 4: Token-Level Q-function Equivalence
  • Lemma 5: Vanishing Gradient of Frictive State $\phi$
  • Corollary 1