Problem-Solving in Language Model Networks
Ciaran Regan, Alexandre Gournail, Mizuki Oka
TL;DR
This study extends multi-agent debate to graph-based network topologies to assess how network structure, self-reflection, and bias affect QA performance in language-model agents. By comparing scale-free, random, fully connected, and fully disconnected networks using four rounds of debate on 100 MMLU math questions with GPT-3.5-Turbo, the work shows that random networks match fully connected performance while using far fewer tokens, and that hub-centered bias can dramatically alter outcomes. The analysis reveals that strong consensus among agents often coincides with correct answers, while disagreements correlate with incorrect results, highlighting consensus as a proxy for uncertainty. These findings inform scalable design choices for collaborative AI systems, suggesting cost-effective random topologies or strategically hub-centric scale-free networks and using consensus metrics to gauge confidence in collective decisions.
Abstract
To improve the reasoning and question-answering capabilities of Large Language Models (LLMs), several multi-agent approaches have been introduced. While these methods enhance performance, the application of collective intelligence-based approaches to complex network structures and the dynamics of agent interactions remain underexplored. This work extends the concept of multi-agent debate to more general network topologies, measuring the question-answering accuracy, influence, consensus, and the effects of bias on the collective. The results show that random networks perform similarly to fully connected networks despite using significantly fewer tokens. Furthermore, a strong consensus among agents correlates with correct answers, whereas divided responses typically indicate incorrect answers. Analysing the influence of the agents reveals a balance between self-reflection and interconnectedness; self-reflection aids when local interactions are incorrect, and local interactions aid when the agent itself is incorrect. Additionally, bias plays a strong role in system performance with correctly biased hub nodes boosting performance. These insights suggest that using random networks or scale-free networks with knowledgeable agents placed in central positions can enhance the overall question-answering performance of multi-agent systems.
