ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks

Heng Zhou; Hejia Geng; Xiangyuan Xue; Li Kang; Yiran Qin; Zhiyong Wang; Zhenfei Yin; Lei Bai

ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks

Heng Zhou, Hejia Geng, Xiangyuan Xue, Li Kang, Yiran Qin, Zhiyong Wang, Zhenfei Yin, Lei Bai

TL;DR

This work tackles the scalability and optimization challenges of reasoning with large language models by introducing ReSo, a reward-driven, self-organizing multi-agent system. It combines task graph generation with a dynamic agent selection process guided by a Collaborative Reward Model (CRM) and an automated data-synthesis framework to create MAS benchmarks without human annotations. Key innovations include a Dynamic Agent Database, a two-stage agent search (coarse via UCB and fine-grained via CRM), and an MCTS-inspired perspective to efficiently navigate task graphs. Empirical results show ReSo matching or surpassing state-of-the-art methods on Math-MAS and SciBench-MAS, with strong generalization on standard benchmarks, and thorough ablations confirming the value of task decomposition, agent selection, and reward signaling. The approach demonstrates scalable, data-driven optimization of MAS cooperation, with open-source code and data to enable broader adoption and cross-domain application.

Abstract

Multi-agent systems (MAS) have emerged as a promising approach for enhancing the reasoning capabilities of large language models in complex problem-solving; however, current MAS frameworks suffer from poor flexibility and scalability with underdeveloped optimization strategies. To address these challenges, we propose ReSo, which integrates task graph generation with a reward-driven two-stage agent selection process centered on our Collaborative Reward Model that provides fine-grained reward signals to optimize MAS cooperation. We also introduce an automated data synthesis framework for generating MAS benchmarks without any human annotations. Experimental results show that ReSo matches or outperforms existing methods, achieving 33.7 percent accuracy on Math-MAS and 32.3 percent accuracy on SciBench-MAS, where other approaches completely fail.

ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks

TL;DR

Abstract

ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)