Table of Contents
Fetching ...

FlowReasoner: Reinforcing Query-Level Meta-Agents

Hongcheng Gao, Yue Liu, Yufei He, Longxu Dou, Chao Du, Zhijie Deng, Bryan Hooi, Min Lin, Tianyu Pang

TL;DR

FlowReasoner introduces a query-level meta-agent that designs a per-query multi-agent system by learning to reason from external execution feedback. The approach combines a reasoning-based warmup via supervised fine-tuning with reinforcement learning using GRPO to optimize workflows for each user query, guided by a multi-objective reward. Experiments on BigCodeBench, HumanEval, and MBPP show FlowReasoner-14B consistently outperforms task-level baselines and handcrafted workflows, achieving about a 10.5-point advantage over o1-mini and a 5-point gain over MaAS. The work demonstrates strong generalization across worker models and releases code for public use, highlighting a scalable path to automatic, per-query MAS design.

Abstract

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback. A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by 10.52% accuracy across three benchmarks. The code is available at https://github.com/sail-sg/FlowReasoner.

FlowReasoner: Reinforcing Query-Level Meta-Agents

TL;DR

FlowReasoner introduces a query-level meta-agent that designs a per-query multi-agent system by learning to reason from external execution feedback. The approach combines a reasoning-based warmup via supervised fine-tuning with reinforcement learning using GRPO to optimize workflows for each user query, guided by a multi-objective reward. Experiments on BigCodeBench, HumanEval, and MBPP show FlowReasoner-14B consistently outperforms task-level baselines and handcrafted workflows, achieving about a 10.5-point advantage over o1-mini and a 5-point gain over MaAS. The work demonstrates strong generalization across worker models and releases code for public use, highlighting a scalable path to automatic, per-query MAS design.

Abstract

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback. A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by 10.52% accuracy across three benchmarks. The code is available at https://github.com/sail-sg/FlowReasoner.

Paper Structure

This paper contains 14 sections, 5 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Task-Level Meta-Agents vs. Query-Level Meta-Agents at Inference Time.$q$ denotes a user query, e.g., build a 2048 game. $t\sim P(q)$ denotes one kind of task, e.g., code generation task, which is a distribution of user queries. Given $t$, previous task-level meta-agent $\mathcal{A}_{\text{meta\_task}}$ aims to search a task-specific multi-agent system $\mathcal{S}_{\text{task}}$ to solve all queries sampled from $t$, i.e., one system per task. Differently, given one user query $q^{(i)}$, our query-level meta-agent $\mathcal{A}_{\text{meta\_query}}$ conducts reasoning and output a query-specific multi-agent system $\mathcal{S}_{\text{query}}^{(i)}$ for $q^{(i)}$, i.e., one system per query.
  • Figure 2: Architectural Comparison of Three Multi-Agent Systems. (a) Manually-designed Multi-agent System, (b) Search-based Automatic Multi-agent System, and (c) Reasoning-based Automatic Multi-agent System.
  • Figure 3: Training Pipeline of FlowReasoner. It consists of (1) Reasoning Data Distillation, (2) Reasoning SFT Warmup, (3) Reinforce Reasoning from external execution feedback.
  • Figure 4: Ablation of Meta-agent and Workers. (a) Accuracy of different meta-agents with o1-mini as workers. (B) Accuracy of the generated workflow with different workers.
  • Figure 5: Cases of Workflows generated by FlowReasoner-14B for the tasks of BigCodeBench and HumanEval.
  • ...and 6 more figures