HieraMAS: Optimizing Intra-Node LLM Mixtures and Inter-Node Topology for Multi-Agent Systems

Tianjun Yao; Zhaoyi Li; Zhiqiang Shen

HieraMAS: Optimizing Intra-Node LLM Mixtures and Inter-Node Topology for Multi-Agent Systems

Tianjun Yao, Zhaoyi Li, Zhiqiang Shen

TL;DR

HieraMAS is a hierarchical collaboration framework that combines intra-node LLM mixtures with an inter-node communication topology and introduces supernodes, where each functional role is implemented by multiple heterogeneous LLMs using a propose-synthesis structure.

Abstract

Multi-agent systems (MAS) built on large language models (LLMs) have shown strong performance across many tasks. Most existing approaches improve only one aspect at a time, such as the communication topology, role assignment, or LLM routing, while treating each agent as a single, indivisible unit. This misses the opportunity to use mixtures of LLMs within an agent to strengthen role-specific abilities. We propose HieraMAS, a hierarchical collaboration framework that combines intra-node LLM mixtures with an inter-node communication topology. HieraMAS introduces supernodes, where each functional role is implemented by multiple heterogeneous LLMs using a propose-synthesis structure. Optimizing HieraMAS creates unique credit-assignment challenges: final task performance depends heavily on the underlying LLMs' capabilities, which can lead reinforcement methods to incorrectly reward suboptimal configurations. To address this, we use a two-stage algorithm: (1) multi-level reward attribution, which provides fine-grained feedback at both the node level and the overall system level; (2) graph classification for topology selection, which treats choosing the communication structure as a holistic decision rather than optimizing edges one by one. Experiments on reasoning and coding benchmarks show that HieraMAS substantially outperforms existing methods while also delivering better cost-performance trade-offs.

HieraMAS: Optimizing Intra-Node LLM Mixtures and Inter-Node Topology for Multi-Agent Systems

TL;DR

Abstract

Paper Structure (41 sections, 6 theorems, 34 equations, 4 figures, 13 tables, 1 algorithm)

This paper contains 41 sections, 6 theorems, 34 equations, 4 figures, 13 tables, 1 algorithm.

Introduction
Preliminaries
Notation Establishment
MDP Formulation
Optimization Objective
Method
LLM Selection within Supernodes
Graph Topology Selection
Two-Stage Training Algorithm
Stage 1: Supernode Optimization with Random Graphs
Stage 2: Graph Classifier Training
Theoretical Analysis
Experiments
Experimental Setup
Datasets and Metrics.
...and 26 more sections

Key Result

Theorem 3.1

Consider optimizing a multi-agent system with $N$ supernodes and a communication graph $G \in \mathcal{G}$.

Figures (4)

Figure 1: Illustration of two credit assignment challenges in joint optimization and our solutions. Challenge 1: Final task rewards mask individual node errors—Node 2 produces incorrect output but receives high reward $R_2=0.92$. HieraMAS addresses this via multi-level rewards that provide effective per-role attribution ($R_2^{eff}=-0.23$). Challenge 2: Per-edge optimization suffers from entangled attribution, where edges may be falsely reinforced or suppressed. HieraMAS reformulates topology selection as a holistic graph classification task, using a graph generator to produce candidates and a graph classifier to select the optimal topology.
Figure 2: The overall framework of HieraMAS. By optimizing a policy learner $\pi_m$ with multi-level rewards (Stage 1) and a graph classifier $f_G(\cdot)$ with contrastive rewards (Stage 2), HieraMAS learns to select optimal supernode configurations and communication topologies. During inference, the trained modules jointly determine the supernode configurations and graph topology, then execute the MAS to produce the final answer.
Figure 3: Analysis of learned topologies on MMLU-Redux. (a) Visualization of the top-3 most frequently selected graph structures with their density. (b) Pairwise Jaccard similarity between top-5 graphs, showing low structural overlap.
Figure 4: Dataset-level LLM selection preferences learned by HieraMAS. Normalized Logits indicate selection preference, with higher values indicating stronger preference.

Theorems & Definitions (11)

Definition 1: Supernode
Theorem 3.1
Proposition 1: Gradient Bias under Final Reward
proof
Corollary 3.1: Sufficient Condition for Gradient with Correct Sign
proof
Proposition 2: Credit Assignment Error in Per-Edge Optimization
proof
Corollary 3.2: Justification for Holistic Graph Selection
Theorem 3.3: Generalization Guarantee for Graph Classifier
...and 1 more

HieraMAS: Optimizing Intra-Node LLM Mixtures and Inter-Node Topology for Multi-Agent Systems

TL;DR

Abstract

HieraMAS: Optimizing Intra-Node LLM Mixtures and Inter-Node Topology for Multi-Agent Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)