Table of Contents
Fetching ...

Evolving and Executing Research Plans via Double-Loop Multi-Agent Collaboration

Zhi Zhang, Yan Liu, Zhejing Hu, Gong Chen, Sheng-hua Zhong, Jiannong Cao

TL;DR

The paper tackles automated scientific research by formulating it as a bilevel optimization problem: \( \max_{p \in \mathcal{P}} R(p, y^*(p)) \) subject to \( y^*(p) \in \arg\max_{y \in \mathcal{Y}(p)} f(p,y) \). It introduces the Double-Loop Multi-Agent (DLMA) framework, where a leader loop of professor agents evolves a population of plans through involvement, improvement, and integration meetings, and a follower loop of doctoral agents executes the chosen plan with pre-hoc/post-hoc planning, contextual/external observations, and continual draft refinement. Extensive experiments on ACLAward and Laboratory show state-of-the-art automatic evaluation scores, with ablation studies confirming that both loops are essential: evolution drives novelty while execution ensures soundness. While DLMA advances automated scientific discovery by integrating literature review, experimentation, and drafting, it incurs significant computational costs and faces challenges like code-generation hallucinations, motivating future work on efficiency and reliability.

Abstract

Automating the end-to-end scientific research process poses a fundamental challenge: it requires both evolving high-level plans that are novel and sound, and executing these plans correctly amidst dynamic and uncertain conditions. To address this bilevel challenge, we propose a novel Double-Loop Multi-Agent (DLMA) framework to solve the given research problem automatically. The leader loop, composed of professor agents, is responsible for evolving research plans. It employs an evolutionary algorithm through involvement, improvement, and integration meetings to iteratively generate and refine a pool of research proposals, exploring the solution space effectively. The follower loop, composed of doctoral student agents, is responsible for executing the best-evolved plan. It dynamically adjusts the plan during implementation via pre-hoc and post-hoc meetings, ensuring each step (e.g., drafting, coding) is well-supported by contextual and external observations. Extensive experiments on benchmarks like ACLAward and Laboratory show that DLMA generates research papers that achieve state-of-the-art scores in automated evaluation, significantly outperforming strong baselines. Ablation studies confirm the critical roles of both loops, with evolution driving novelty and execution ensuring soundness.

Evolving and Executing Research Plans via Double-Loop Multi-Agent Collaboration

TL;DR

The paper tackles automated scientific research by formulating it as a bilevel optimization problem: \( \max_{p \in \mathcal{P}} R(p, y^*(p)) \) subject to \( y^*(p) \in \arg\max_{y \in \mathcal{Y}(p)} f(p,y) \). It introduces the Double-Loop Multi-Agent (DLMA) framework, where a leader loop of professor agents evolves a population of plans through involvement, improvement, and integration meetings, and a follower loop of doctoral agents executes the chosen plan with pre-hoc/post-hoc planning, contextual/external observations, and continual draft refinement. Extensive experiments on ACLAward and Laboratory show state-of-the-art automatic evaluation scores, with ablation studies confirming that both loops are essential: evolution drives novelty while execution ensures soundness. While DLMA advances automated scientific discovery by integrating literature review, experimentation, and drafting, it incurs significant computational costs and faces challenges like code-generation hallucinations, motivating future work on efficiency and reliability.

Abstract

Automating the end-to-end scientific research process poses a fundamental challenge: it requires both evolving high-level plans that are novel and sound, and executing these plans correctly amidst dynamic and uncertain conditions. To address this bilevel challenge, we propose a novel Double-Loop Multi-Agent (DLMA) framework to solve the given research problem automatically. The leader loop, composed of professor agents, is responsible for evolving research plans. It employs an evolutionary algorithm through involvement, improvement, and integration meetings to iteratively generate and refine a pool of research proposals, exploring the solution space effectively. The follower loop, composed of doctoral student agents, is responsible for executing the best-evolved plan. It dynamically adjusts the plan during implementation via pre-hoc and post-hoc meetings, ensuring each step (e.g., drafting, coding) is well-supported by contextual and external observations. Extensive experiments on benchmarks like ACLAward and Laboratory show that DLMA generates research papers that achieve state-of-the-art scores in automated evaluation, significantly outperforming strong baselines. Ablation studies confirm the critical roles of both loops, with evolution driving novelty and execution ensuring soundness.

Paper Structure

This paper contains 14 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Double-loop learning involves two loops. In the first loop, the system takes actions based on the current goals, while in the second loop, the goals themselves can be questioned and modified.
  • Figure 2: Overview of the double-loop learning framework. It consists of two loops: the leader loop and the follower loop. In the leader loop, we build a pool of potential solutions and evolve them iteratively. In the follower loop, we execute the most promising plan and dynamically adapt actions.
  • Figure 3: Ablation study on the ACLAward dataset with the ACL review and ICLR review.
  • Figure 4: Two case studies comparing human expert research output and Double-Loop Multi-agent (DLMA) Framework output.
  • Figure 5: Support rate of the $t$-th planning in the to-do list before the pre-hoc meeting and after the post-hoc meeting. Higher scores indicate better alignment between observations and the plan.