The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration
Kotaro Furuya, Yuichi Kitagawa
TL;DR
The paper tackles the problem of assembling synergistic multi-agent teams for large language models without relying on internal architectures or training data. It introduces an interaction-centric framework that builds a language model graph from pairwise dialogues and uses cosine embedding similarities to assign edge weights, followed by Louvain-based community detection to identify functionally cohesive model clusters. Empirical results show that topic-guided conversations yield communities whose downstream task performance approaches that of manually curated, specialization-based teams and consistently outperform random baselines. This approach enables automated, scalable design of collaborative LLM teams and can be integrated with task-driven planning to form targeted, interdisciplinary subteams. The work highlights the importance of topical context and suggests avenues for scaling and richer collaboration protocols.
Abstract
While a multi-agent approach based on large language models (LLMs) represents a promising strategy to surpass the capabilities of single models, its success is critically dependent on synergistic team composition. However, forming optimal teams is a significant challenge, as the inherent opacity of most models obscures the internal characteristics necessary for effective collaboration. In this paper, we propose an interaction-centric framework for automatic team composition that does not require any prior knowledge including their internal architectures, training data, or task performances. Our method constructs a "language model graph" that maps relationships between models from the semantic coherence of pairwise conversations, and then applies community detection to identify synergistic model clusters. Our experiments with diverse LLMs demonstrate that the proposed method discovers functionally coherent groups that reflect their latent specializations. Priming conversations with specific topics identified synergistic teams which outperform random baselines on downstream benchmarks and achieve comparable accuracy to that of manually-curated teams based on known model specializations. Our findings provide a new basis for the automated design of collaborative multi-agent LLM teams.
