AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Siyu Wang; Ruotian Lu; Zhihao Yang; Yuchao Wang; Yanzhou Zhang; Lei Xu; Qimin Xu; Guojun Yin; Cailian Chen; Xinping Guan

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Siyu Wang, Ruotian Lu, Zhihao Yang, Yuchao Wang, Yanzhou Zhang, Lei Xu, Qimin Xu, Guojun Yin, Cailian Chen, Xinping Guan

TL;DR

This work proposes AgentConductor: a reinforcement learning-optimized MAS with an LLM-based orchestrator agent as its core, which enables end-to-end feedback-driven dynamic generation of interaction topologies, and designs a novel topological density function that captures communication-aware mathematical characterizations of multi-agent interactions.

Abstract

Large language model(LLM)-driven multi-agent systems(MAS) coordinate specialized agents through predefined interaction topologies and have shown promise for complex tasks such as competition-level code generation. Recent studies demonstrate that carefully designed multi-agent workflows and communication graphs can significantly improve code generation performance by leveraging collaborative reasoning. However, existing methods neither adapt topology density to task difficulty nor iteratively refine the topology within an instance using execution feedback, which leads to redundant communication and performance bottlenecks. To address these issues, we propose AgentConductor: a reinforcement learning-optimized MAS with an LLM-based orchestrator agent as its core, which enables end-to-end feedback-driven dynamic generation of interaction topologies. For each query, AgentConductor infers agent roles and task difficulty, then constructs a task-adapted, density-aware layered directed acyclic graph (DAG) topology, underpinned by two key innovations. First, we design a novel topological density function that captures communication-aware mathematical characterizations of multi-agent interactions. Second, we adopt difficulty interval partitioning to avoid excessive pruning for precise topological density upper bound measurement per difficulty level and finer-grained control. Empirically, across three competition-level and two foundational code datasets, AgentConductor achieves state-of-the-art accuracy, outperforming the strongest baseline by up to 14.6% in pass@1 accuracy, 13% in density reduction, and 68% in token cost reduction.

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

TL;DR

Abstract

Paper Structure (61 sections, 1 theorem, 29 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 61 sections, 1 theorem, 29 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
AgentConductor
Problem Definition
Interaction Topology Notations
AgentConductor Paradigm
Graph Density Evaluation Function
SFT data Generation
Reinforcing Dynamic Topologies for LLM-MA via Trajectory-Level Policy Optimization
GRPO-Based Training for Dynamic Topology Generation
Design of a Rule-Based Multi-Objective Reward Function
Execution Result Reward
Interaction Graph Complexity Reward Function
Experiments
Experimental Setup
Datasets and Metrics
...and 46 more sections

Key Result

Theorem 1

Given DAG $\mathcal{G}^{(k)}$ defined by manager-guided multi-agent interaction, $\mathcal{G}^{(k)}$ is a partite-graph with $b$ parts. Then we have $d^{(k)} = b$, where $d^{(k)}$ is the depth of $\mathcal{G}^{(k)}$.

Figures (8)

Figure 1: YAML representation of the topology, its mapping to the actual graph, and the two-turn graph evolution.
Figure 2: Comparison of Topology Structures and Optimization Paradigms Between Our Method and Classic Baselines
Figure 3: Overall framework of the proposed AgentConductor. The approach proceeds in three stages: (1) SFT on diverse topologies to instill structural priors in the base LLM (Qwen-2.5-Instruct-3B); (2) RL with GRPO to learn task-adaptive, difficulty-aware topology policies from execution feedback, yielding the orchestrator agent; and (3) multi-turn dynamic topology generation for end-to-end code problem solving.
Figure 4: (a) APPS results showing performance, average graph density ($\mathcal{S}_{\text{complex}}$↑ sparser), and completion tokens, with circle size indicating token savings (diameter↑ more). (b) Code generation performance comparison of representative baselines.
Figure 5: Comparison of the average topology density ($\mathcal{S}_{\text{complex}}$↑ sparser) across three competition-level code datasets at three difficulty levels.
...and 3 more figures

Theorems & Definitions (3)

Definition 1
Theorem 1
proof

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

TL;DR

Abstract

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)