Table of Contents
Fetching ...

Multi-LLM Collaborative Search for Complex Problem Solving

Sen Yang, Yafu Li, Wai Lam, Yu Cheng

TL;DR

MoSA introduces Mixture-of-Search-Agents, a multi-LLM framework that embeds diverse, independent exploration and collaborative refinement within a Monte Carlo Tree Search backbone to tackle complex reasoning tasks. By designating LLMs as Proposers and Aggregators, MoSA broadens the search frontier and improves the quality of reasoning steps through both diversity and neural aggregation. Across GSM8K, SVAMP, MATH-500, and StrategyQA, MoSA delivers consistent gains over single-LLM and baseline multi-agent methods, with notable improvements on harder problems like MATH-500 and evidence of a synergistic interplay between agent diversity and search. The work demonstrates the practical value of coordinated, multi-agent search for robust, scalable reasoning and highlights the benefit of extended action sets for complex tasks.

Abstract

Large language models (LLMs) often struggle with complex reasoning tasks due to their limitations in addressing the vast reasoning space and inherent ambiguities of natural language. We propose the Mixture-of-Search-Agents (MoSA) paradigm, a novel approach leveraging the collective expertise of multiple LLMs to enhance search-based reasoning. MoSA integrates diverse reasoning pathways by combining independent exploration with iterative refinement among LLMs, mitigating the limitations of single-model approaches. Using Monte Carlo Tree Search (MCTS) as a backbone, MoSA enables multiple agents to propose and aggregate reasoning steps, resulting in improved accuracy. Our comprehensive evaluation across four reasoning benchmarks demonstrates MoSA's consistent performance improvements over single-agent and other multi-agent baselines, particularly in complex mathematical and commonsense reasoning tasks.

Multi-LLM Collaborative Search for Complex Problem Solving

TL;DR

MoSA introduces Mixture-of-Search-Agents, a multi-LLM framework that embeds diverse, independent exploration and collaborative refinement within a Monte Carlo Tree Search backbone to tackle complex reasoning tasks. By designating LLMs as Proposers and Aggregators, MoSA broadens the search frontier and improves the quality of reasoning steps through both diversity and neural aggregation. Across GSM8K, SVAMP, MATH-500, and StrategyQA, MoSA delivers consistent gains over single-LLM and baseline multi-agent methods, with notable improvements on harder problems like MATH-500 and evidence of a synergistic interplay between agent diversity and search. The work demonstrates the practical value of coordinated, multi-agent search for robust, scalable reasoning and highlights the benefit of extended action sets for complex tasks.

Abstract

Large language models (LLMs) often struggle with complex reasoning tasks due to their limitations in addressing the vast reasoning space and inherent ambiguities of natural language. We propose the Mixture-of-Search-Agents (MoSA) paradigm, a novel approach leveraging the collective expertise of multiple LLMs to enhance search-based reasoning. MoSA integrates diverse reasoning pathways by combining independent exploration with iterative refinement among LLMs, mitigating the limitations of single-model approaches. Using Monte Carlo Tree Search (MCTS) as a backbone, MoSA enables multiple agents to propose and aggregate reasoning steps, resulting in improved accuracy. Our comprehensive evaluation across four reasoning benchmarks demonstrates MoSA's consistent performance improvements over single-agent and other multi-agent baselines, particularly in complex mathematical and commonsense reasoning tasks.

Paper Structure

This paper contains 22 sections, 16 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Reasoning performance on MATH-500 against search trajectory diversity. While the diversity of single-LLM search varies with different sampling temperatures, the multi-LLM search consistently achieves superior performance. More details are provided in $\S$\ref{['sec:analysis:diversity']}.
  • Figure 2: Top: An overview of the root node $s_0$ and its expanded child nodes. Bottom: The detailed framework for generating new actions (i.e., sampling sub-questions and sub-answers).
  • Figure 3: Generate three new actions using MoSA. Left: Use MoSA to propose sub-questions and sub-answers. Right: Use MoSA to aggregate candidate sub-answers.
  • Figure 4: Diversity versus accuracy. T = Temperature.
  • Figure 5: Reasoning accuracy with different number of distinct LLMs as search agents.