Emergent Convergence in Multi-Agent LLM Annotation
Angelina Parfenova, Alexander Denzler, Juergen Pfeffer
TL;DR
This work introduces a scalable multi-agent LLM annotation framework deployed over many rounds to study emergent coordination in inductive coding tasks. By coupling surface metrics (ROUGE, toxicity, confidence) with geometric analyses of output embeddings (TwoNN intrinsic dimensionality) and extensive linguistic features (ELFEN), the authors show that LLM groups exhibit lexical and semantic convergence accompanied by semantic compression and asymmetric influence patterns. The study demonstrates the value of black-box interaction analysis for uncovering coordination strategies, while highlighting risks such as semantic drift and loss of subthemes, guiding future human–AI collaborative annotation designs. Overall, the approach offers a scalable, interpretable lens on collective reasoning in autonomous agent ensembles and points to practical implications for coordinated, transparent annotation workflows.
Abstract
Large language models (LLMs) are increasingly deployed in collaborative settings, yet little is known about how they coordinate when treated as black-box agents. We simulate 7500 multi-agent, multi-round discussions in an inductive coding task, generating over 125000 utterances that capture both final annotations and their interactional histories. We introduce process-level metrics: code stability, semantic self-consistency, and lexical confidence alongside sentiment and convergence measures, to track coordination dynamics. To probe deeper alignment signals, we analyze the evolving geometry of output embeddings, showing that intrinsic dimensionality declines over rounds, suggesting semantic compression. The results reveal that LLM groups converge lexically and semantically, develop asymmetric influence patterns, and exhibit negotiation-like behaviors despite the absence of explicit role prompting. This work demonstrates how black-box interaction analysis can surface emergent coordination strategies, offering a scalable complement to internal probe-based interpretability methods.
