Adaptive In-conversation Team Building for Language Model Agents

Linxin Song; Jiale Liu; Jieyu Zhang; Shaokun Zhang; Ao Luo; Shijian Wang; Qingyun Wu; Chi Wang

Adaptive In-conversation Team Building for Language Model Agents

Linxin Song, Jiale Liu, Jieyu Zhang, Shaokun Zhang, Ao Luo, Shijian Wang, Qingyun Wu, Chi Wang

TL;DR

This work tackles the challenge of forming effective teams of language-model agents for complex tasks by introducing Captain Agent, an adaptive team-building framework. Captain Agent dynamically constructs and manages subteams for each solving step, using nested group conversations and a reflector to ensure diverse expertise and prevent output stagnation. Across six real-world scenarios, it achieves a 21.94% average accuracy improvement over strong baselines without task-specific prompt engineering, and ablation studies confirm the value of adaptive team-building, tool/agent libraries, reflection, and backbone choices. The results suggest a scalable, cost-aware approach to multi-agent collaboration, with implications for deploying adaptable, domain-aware agent systems in practice.

Abstract

Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks, while the effective design of multiple agents for a particular application remains an art. It is thus intriguing to answer a critical question: Given a task, how can we build a team of LLM agents to solve it effectively? Our new adaptive team-building paradigm offers a flexible solution, realized through a novel agent design named Captain Agent. It dynamically forms and manages teams for each step of a task-solving process, utilizing nested group conversations and reflection to ensure diverse expertise and prevent stereotypical outputs, allowing for a flexible yet structured approach to problem-solving. A comprehensive evaluation across six real-world scenarios demonstrates that Captain Agent significantly outperforms existing multi-agent methods with 21.94% improvement in average accuracy, providing outstanding performance without requiring task-specific prompt engineering. Our exploration of different backbone LLM and cost analysis further shows that Captain Agent can improve the conversation quality of weak LLM and achieve competitive performance with extremely low cost, which illuminates the application of multi-agent systems.

Adaptive In-conversation Team Building for Language Model Agents

TL;DR

Abstract

Paper Structure (36 sections, 1 equation, 7 figures, 8 tables)

This paper contains 36 sections, 1 equation, 7 figures, 8 tables.

Introduction
Captain Agent
Adaptive Multi-agent Team Building
Nested Conversation and Reflection
Benefits over Static Team
Evaluation
Experimental Setup
Evaluation Protocol
Main Results
Analysis and Ablation Studies
Static vs. adaptive team-building
Tool library and agent library
Ablation on Reflector
Ablation on LLM Backbone and Cost Analysis
Agent Selected in Each Scenario
...and 21 more sections

Figures (7)

Figure 1: The overall workflow of Captain Agent. Captain Agent can build an agent team apart from the main conversation and make next-step decisions according to the nested conversation results. We highlight the order of reading in the figure and mark them in the following sections.
Figure 2: Ablation comparison between static and adaptive team on the selected subset. Adaptive team during the conversation improves performance in different scenarios.
Figure 3: Ablation of reflection mechanism in Captain Agent. Reflector improves Captain Agent across all scenarios.
Figure 4: (Numerical results can be found in Table \ref{['tab:cost']}) Comparison of performance on our reduced dataset for ablation study. Captain Agent achieves the best performance with gpt-4-0125-preview. Captain Agent with gpt-4o-mini can achieve competitive performance with other baselines that use gpt-4-0125-preview, and have significantly lower cost.
Figure 5: Top-10 selected agents and the corresponding selected times. We can observe that the selected agent is highly related to the scenario, and Verification_Expert has a high selection rate.
...and 2 more figures

Adaptive In-conversation Team Building for Language Model Agents

TL;DR

Abstract

Adaptive In-conversation Team Building for Language Model Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (7)