Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives
Romain Cosentino, Sarath Shekkizhar, Adam Earle
TL;DR
This work presents a mechanistic analysis of two LSA-based in-context optimizers that alternately update from each other under potentially misaligned objectives. It establishes a theoretical framework where asymptotic errors depend on the objective gap and the prompt-induced geometry of each agent, yielding biased plateaus under fixed objectives and monotone degradation with increased misalignment. It further shows that enabling turn-by-turn adaptation via a helper agent can realize Newton-like acceleration, turning a fragile interaction into cooperative optimization. The authors validate the theory with both trained LSA agents and GPT-5-mini experiments, and offer practical design principles for aligning objectives, controlling prompt geometry, and enabling adaptive collaboration to improve robustness in multi-agent LLM systems.
Abstract
We develop and analyze a theoretical framework for agent-to-agent interactions in a simplified in-context linear regression setting. In our model, each agent is instantiated as a single-layer transformer with linear self-attention (LSA) trained to implement gradient-descent-like updates on a quadratic regression objective from in-context examples. We then study the coupled dynamics when two such LSA agents alternately update from each other's outputs under potentially misaligned fixed objectives. Within this framework, we characterize the generation dynamics and show that misalignment leads to a biased equilibrium where neither agent reaches its target, with residual errors predictable from the objective gap and the prompt-induced geometry. We further contrast this fixed objective regime with an adaptive multi-agent setting, wherein a helper agent updates a turn-based objective to implement a Newton-like step for the main agent, eliminating the plateau and accelerating its convergence. Experiments with trained LSA agents, as well as black-box GPT-5-mini runs on in-context linear regression tasks, are consistent with our theoretical predictions within this simplified setting. We view our framework as a mechanistic framework that links prompt geometry and objective misalignment to stability, bias, and robustness, and as a stepping stone toward analyzing more realistic multi-agent LLM systems.
