Table of Contents
Fetching ...

Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search

Wentao Shi, Zichun Yu, Fuli Feng, Xiangnan He, Chenyan Xiong

TL;DR

The paper tackles data synthesis for LLM-based multi-agent systems by addressing the misalignment between Q-value signals and training utility. It introduces Data Influence-oriented Tree Search (DITS), which uses influence scores computed for non-differentiable validation metrics to guide both tree search and data selection, aided by a gradient-to-inference approximation to reduce computational cost. Through iterative data synthesis and evaluation across eight datasets on Information Exchange and Debate tasks, DITS achieves state-of-the-art performance and demonstrates that allocating inference budgets to influence estimation yields superior training efficiency compared to Q-value–driven approaches. The work highlights the practical impact of influence-aware data selection for scalable, robust MAS self-training, while acknowledging societal and ethical considerations of deploying such systems.

Abstract

Monte Carlo Tree Search (MCTS) based methods provide promising approaches for generating synthetic data to enhance the self-training of Large Language Model (LLM) based multi-agent systems (MAS). These methods leverage Q-values to estimate individual agent contributions. However, relying solely on Q-values to identify informative data may misalign with the data synthesis objective, as the focus should be on selecting data that best enhances model training. To address this discrepancy, we propose Data Influence-oriented Tree Search (DITS), a novel framework that incorporates influence scores to guide both tree search and data selection. By leveraging influence scores, we effectively identify the most impactful data for system improvement, thereby enhancing model performance. Furthermore, we derive influence score estimation methods tailored for non-differentiable metrics, significantly reducing computational overhead by utilizing inference computations. Extensive experiments on eight multi-agent datasets demonstrate the robustness and effectiveness of the proposed methods. Notably, our findings reveal that allocating more inference resources to estimate influence scores, rather than Q-values, during data synthesis can more effectively and efficiently enhance model training.

Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search

TL;DR

The paper tackles data synthesis for LLM-based multi-agent systems by addressing the misalignment between Q-value signals and training utility. It introduces Data Influence-oriented Tree Search (DITS), which uses influence scores computed for non-differentiable validation metrics to guide both tree search and data selection, aided by a gradient-to-inference approximation to reduce computational cost. Through iterative data synthesis and evaluation across eight datasets on Information Exchange and Debate tasks, DITS achieves state-of-the-art performance and demonstrates that allocating inference budgets to influence estimation yields superior training efficiency compared to Q-value–driven approaches. The work highlights the practical impact of influence-aware data selection for scalable, robust MAS self-training, while acknowledging societal and ethical considerations of deploying such systems.

Abstract

Monte Carlo Tree Search (MCTS) based methods provide promising approaches for generating synthetic data to enhance the self-training of Large Language Model (LLM) based multi-agent systems (MAS). These methods leverage Q-values to estimate individual agent contributions. However, relying solely on Q-values to identify informative data may misalign with the data synthesis objective, as the focus should be on selecting data that best enhances model training. To address this discrepancy, we propose Data Influence-oriented Tree Search (DITS), a novel framework that incorporates influence scores to guide both tree search and data selection. By leveraging influence scores, we effectively identify the most impactful data for system improvement, thereby enhancing model performance. Furthermore, we derive influence score estimation methods tailored for non-differentiable metrics, significantly reducing computational overhead by utilizing inference computations. Extensive experiments on eight multi-agent datasets demonstrate the robustness and effectiveness of the proposed methods. Notably, our findings reveal that allocating more inference resources to estimate influence scores, rather than Q-values, during data synthesis can more effectively and efficiently enhance model training.

Paper Structure

This paper contains 23 sections, 16 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) The scatter plot and density plots of Q-values and influence scores for synthetic data. The top 30% of the data selected using DITS is highlighted in red. (b) Performance trends with different data synthesis budgets (Tokens).
  • Figure 2: Overview of our method. (a) illustrates the traversal of a cyclic agent network in topological order. We introduce virtual agents to distinguish the same agent in the traversal. (b) showcases the application of MCTS to generate synthetic multi-agent training data, where the color of each agent represents the magnitude of the node's Q-value. (c) depicts the computation process of influence scores for a non-differentiable metric, highlighting that data points with high Q-values may correspond to low influence scores.
  • Figure 3: The scatter plot and density plots of Q-values and influence scores for the synthetic data. The top 30% of the data selected by DITS is highlighted in red.
  • Figure 4: The effect of hyperparameter selection ratio $\alpha$ on the performance of DITS on the 2WMH QA and TrivalQA datasets.
  • Figure 5: The relative performance improvement of DITS-iSFT-DPO across all datasets at different iterations. The best performance of each dataset is set as 1.0.
  • ...and 2 more figures