Table of Contents
Fetching ...

SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning

Wanjia Zhao, Mert Yuksekgonul, Shirley Wu, James Zou

TL;DR

SiriuS addresses the challenge of optimizing multi-agent LLM systems by learning from outcome-driven interaction trajectories rather than intermediate steps. It introduces an experience library of high-quality reasoning trajectories and a trajectory augmentation mechanism to enrich training data, enabling iterative self-improvement through fine-tuning. Empirical results demonstrate significant gains in reasoning accuracy and biomedical QA, as well as improved negotiation performance in competitive settings, across multiple domains and backbone models. The framework also shows robust generalization to new configurations, offering a scalable data-generation pipeline for self-correction and self-play enhancement in future multi-agent AI systems.

Abstract

Multi-agent AI systems powered by large language models (LLMs) are increasingly applied to solve complex tasks. However, these systems often rely on fragile, manually designed prompts and heuristics, making optimization difficult. A key challenge in optimizing multi-agent systems is acquiring suitable training data for specialized agents. We introduce SiriuS, a self-improving, reasoning-driven optimization framework for multi-agent systems. Central to our approach is the construction of an experience library: a repository of high-quality reasoning trajectories. The library is built by retaining reasoning steps that lead to successful outcomes, providing a robust training set for optimizing multi-agent system. Additionally, we introduce a library augmentation procedure that refines unsuccessful trajectories, further enriching the library. SiriuS boosts performance by 2.86\% to 21.88\% on reasoning and biomedical QA and enhances agent negotiation in competitive settings. Our results show that SiriuS enhances multi-agent performance while generating reusable data for self-correction and self-play enhancement in the future.

SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning

TL;DR

SiriuS addresses the challenge of optimizing multi-agent LLM systems by learning from outcome-driven interaction trajectories rather than intermediate steps. It introduces an experience library of high-quality reasoning trajectories and a trajectory augmentation mechanism to enrich training data, enabling iterative self-improvement through fine-tuning. Empirical results demonstrate significant gains in reasoning accuracy and biomedical QA, as well as improved negotiation performance in competitive settings, across multiple domains and backbone models. The framework also shows robust generalization to new configurations, offering a scalable data-generation pipeline for self-correction and self-play enhancement in future multi-agent AI systems.

Abstract

Multi-agent AI systems powered by large language models (LLMs) are increasingly applied to solve complex tasks. However, these systems often rely on fragile, manually designed prompts and heuristics, making optimization difficult. A key challenge in optimizing multi-agent systems is acquiring suitable training data for specialized agents. We introduce SiriuS, a self-improving, reasoning-driven optimization framework for multi-agent systems. Central to our approach is the construction of an experience library: a repository of high-quality reasoning trajectories. The library is built by retaining reasoning steps that lead to successful outcomes, providing a robust training set for optimizing multi-agent system. Additionally, we introduce a library augmentation procedure that refines unsuccessful trajectories, further enriching the library. SiriuS boosts performance by 2.86\% to 21.88\% on reasoning and biomedical QA and enhances agent negotiation in competitive settings. Our results show that SiriuS enhances multi-agent performance while generating reusable data for self-correction and self-play enhancement in the future.

Paper Structure

This paper contains 34 sections, 5 equations, 7 figures, 7 tables, 2 algorithms.

Figures (7)

  • Figure 1: General training pipeline of SiriuS.Agents solve problems sequentially, storing correct responses for fine-tuning and augmenting incorrect ones through feedback, regeneration, and rephrasing. This iterative process improves performance via reward-based evaluation and supervised fine-tuning. The module colors in the figure correspond to those in Algorithm \ref{['alg:method']}.
  • Figure 2: Resource Exchange Game: Player 1 (25Xs + 5Ys), Player 2 (5Xs + 25Ys). Win Rate in decisive games and Payoff in all games. We show Player 2 Win rate/payoff in all cells.
  • Figure 3: Player 1's payoff in the Ultimatum game with Initial Resource settings of 100. SiriuS as Player 1 can effectively secure a higher share of the split.
  • Figure 4: Final Selling Price for a Seller&Buyer with object valuations of 40 and 60. A higher number means the seller gets a greater payoff.
  • Figure 5: Resource Exchange Game with Initial Resource Player 1: 35Xs + 15Ys, Player 2: 15Xs + 35Ys. Win Rate in decisive games and Payoff in all games. We show Player 2 Win rate/payoff in all cells.
  • ...and 2 more figures