Table of Contents
Fetching ...

Emergent Convergence in Multi-Agent LLM Annotation

Angelina Parfenova, Alexander Denzler, Juergen Pfeffer

TL;DR

This work introduces a scalable multi-agent LLM annotation framework deployed over many rounds to study emergent coordination in inductive coding tasks. By coupling surface metrics (ROUGE, toxicity, confidence) with geometric analyses of output embeddings (TwoNN intrinsic dimensionality) and extensive linguistic features (ELFEN), the authors show that LLM groups exhibit lexical and semantic convergence accompanied by semantic compression and asymmetric influence patterns. The study demonstrates the value of black-box interaction analysis for uncovering coordination strategies, while highlighting risks such as semantic drift and loss of subthemes, guiding future human–AI collaborative annotation designs. Overall, the approach offers a scalable, interpretable lens on collective reasoning in autonomous agent ensembles and points to practical implications for coordinated, transparent annotation workflows.

Abstract

Large language models (LLMs) are increasingly deployed in collaborative settings, yet little is known about how they coordinate when treated as black-box agents. We simulate 7500 multi-agent, multi-round discussions in an inductive coding task, generating over 125000 utterances that capture both final annotations and their interactional histories. We introduce process-level metrics: code stability, semantic self-consistency, and lexical confidence alongside sentiment and convergence measures, to track coordination dynamics. To probe deeper alignment signals, we analyze the evolving geometry of output embeddings, showing that intrinsic dimensionality declines over rounds, suggesting semantic compression. The results reveal that LLM groups converge lexically and semantically, develop asymmetric influence patterns, and exhibit negotiation-like behaviors despite the absence of explicit role prompting. This work demonstrates how black-box interaction analysis can surface emergent coordination strategies, offering a scalable complement to internal probe-based interpretability methods.

Emergent Convergence in Multi-Agent LLM Annotation

TL;DR

This work introduces a scalable multi-agent LLM annotation framework deployed over many rounds to study emergent coordination in inductive coding tasks. By coupling surface metrics (ROUGE, toxicity, confidence) with geometric analyses of output embeddings (TwoNN intrinsic dimensionality) and extensive linguistic features (ELFEN), the authors show that LLM groups exhibit lexical and semantic convergence accompanied by semantic compression and asymmetric influence patterns. The study demonstrates the value of black-box interaction analysis for uncovering coordination strategies, while highlighting risks such as semantic drift and loss of subthemes, guiding future human–AI collaborative annotation designs. Overall, the approach offers a scalable, interpretable lens on collective reasoning in autonomous agent ensembles and points to practical implications for coordinated, transparent annotation workflows.

Abstract

Large language models (LLMs) are increasingly deployed in collaborative settings, yet little is known about how they coordinate when treated as black-box agents. We simulate 7500 multi-agent, multi-round discussions in an inductive coding task, generating over 125000 utterances that capture both final annotations and their interactional histories. We introduce process-level metrics: code stability, semantic self-consistency, and lexical confidence alongside sentiment and convergence measures, to track coordination dynamics. To probe deeper alignment signals, we analyze the evolving geometry of output embeddings, showing that intrinsic dimensionality declines over rounds, suggesting semantic compression. The results reveal that LLM groups converge lexically and semantically, develop asymmetric influence patterns, and exhibit negotiation-like behaviors despite the absence of explicit role prompting. This work demonstrates how black-box interaction analysis can surface emergent coordination strategies, offering a scalable complement to internal probe-based interpretability methods.

Paper Structure

This paper contains 33 sections, 2 equations, 16 figures, 4 tables, 1 algorithm.

Figures (16)

  • Figure 1: Overview of our multi‑agent simulation framework. LLM agents iteratively exchange outputs via a shared conversational memory, progressing from Round 1 to Round $N$. Over rounds, codes move from dispersed to clustered in semantic space, while ROUGE increases and intrinsic dimensionality (TwoNN‑Id) decreases, indicating lexical convergence and semantic compression.
  • Figure 2: UMAP projection of LLM-generated codes before and after four rounds of multi-agent discussion (5 models). Each point represents a single code, colored by model type. Pre-discussion codes are more dispersed in embedding space (left), while post-discussion codes form tighter clusters with greater cross-model overlap (right). This reflects both lexical convergence and a form of semantic compression, where diverse initial proposals collapse into a lower-dimensional, more aligned representation.
  • Figure 3: ROUGE Score Convergence Across Rounds for Three LLMs. This plot shows ROUGE-1, ROUGE-2, and ROUGE-L similarity scores between pairs of models (Llama4 Maverick, Llama3.3 70B, and Deepseek-R1 70B) across successive discussion rounds. Scores are computed based on model-generated codes at each round, capturing convergence in lexical overlap over time.
  • Figure 4: Density plots of normalized opinion vs. confidence across discussion rounds. Each subplot represents a 2D histogram of model utterances in a given round, showing how expressed opinions (x-axis) relate to confidence scores (y-axis). Darker regions indicate higher concentration of utterances. Over rounds, the distribution evolves from distinct opinion-confidence clusters (Round 1) to more dispersed and overlapping patterns (Rounds 3–5).
  • Figure 5: Model Stability and Self-consistency Across Rounds. Top: Code Stability (1 - change rate) measures the proportion of tokens retained between rounds, reflecting how much models revise their outputs. Bottom: Self-consistency Score is computed as the cosine similarity of TF-IDF vectors between consecutive rounds, indicating semantic consistency.
  • ...and 11 more figures