Table of Contents
Fetching ...

HieraMAS: Optimizing Intra-Node LLM Mixtures and Inter-Node Topology for Multi-Agent Systems

Tianjun Yao, Zhaoyi Li, Zhiqiang Shen

TL;DR

HieraMAS is a hierarchical collaboration framework that combines intra-node LLM mixtures with an inter-node communication topology and introduces supernodes, where each functional role is implemented by multiple heterogeneous LLMs using a propose-synthesis structure.

Abstract

Multi-agent systems (MAS) built on large language models (LLMs) have shown strong performance across many tasks. Most existing approaches improve only one aspect at a time, such as the communication topology, role assignment, or LLM routing, while treating each agent as a single, indivisible unit. This misses the opportunity to use mixtures of LLMs within an agent to strengthen role-specific abilities. We propose HieraMAS, a hierarchical collaboration framework that combines intra-node LLM mixtures with an inter-node communication topology. HieraMAS introduces supernodes, where each functional role is implemented by multiple heterogeneous LLMs using a propose-synthesis structure. Optimizing HieraMAS creates unique credit-assignment challenges: final task performance depends heavily on the underlying LLMs' capabilities, which can lead reinforcement methods to incorrectly reward suboptimal configurations. To address this, we use a two-stage algorithm: (1) multi-level reward attribution, which provides fine-grained feedback at both the node level and the overall system level; (2) graph classification for topology selection, which treats choosing the communication structure as a holistic decision rather than optimizing edges one by one. Experiments on reasoning and coding benchmarks show that HieraMAS substantially outperforms existing methods while also delivering better cost-performance trade-offs.

HieraMAS: Optimizing Intra-Node LLM Mixtures and Inter-Node Topology for Multi-Agent Systems

TL;DR

HieraMAS is a hierarchical collaboration framework that combines intra-node LLM mixtures with an inter-node communication topology and introduces supernodes, where each functional role is implemented by multiple heterogeneous LLMs using a propose-synthesis structure.

Abstract

Multi-agent systems (MAS) built on large language models (LLMs) have shown strong performance across many tasks. Most existing approaches improve only one aspect at a time, such as the communication topology, role assignment, or LLM routing, while treating each agent as a single, indivisible unit. This misses the opportunity to use mixtures of LLMs within an agent to strengthen role-specific abilities. We propose HieraMAS, a hierarchical collaboration framework that combines intra-node LLM mixtures with an inter-node communication topology. HieraMAS introduces supernodes, where each functional role is implemented by multiple heterogeneous LLMs using a propose-synthesis structure. Optimizing HieraMAS creates unique credit-assignment challenges: final task performance depends heavily on the underlying LLMs' capabilities, which can lead reinforcement methods to incorrectly reward suboptimal configurations. To address this, we use a two-stage algorithm: (1) multi-level reward attribution, which provides fine-grained feedback at both the node level and the overall system level; (2) graph classification for topology selection, which treats choosing the communication structure as a holistic decision rather than optimizing edges one by one. Experiments on reasoning and coding benchmarks show that HieraMAS substantially outperforms existing methods while also delivering better cost-performance trade-offs.
Paper Structure (41 sections, 6 theorems, 34 equations, 4 figures, 13 tables, 1 algorithm)

This paper contains 41 sections, 6 theorems, 34 equations, 4 figures, 13 tables, 1 algorithm.

Key Result

Theorem 3.1

Consider optimizing a multi-agent system with $N$ supernodes and a communication graph $G \in \mathcal{G}$.

Figures (4)

  • Figure 1: Illustration of two credit assignment challenges in joint optimization and our solutions. Challenge 1: Final task rewards mask individual node errors—Node 2 produces incorrect output but receives high reward $R_2=0.92$. HieraMAS addresses this via multi-level rewards that provide effective per-role attribution ($R_2^{eff}=-0.23$). Challenge 2: Per-edge optimization suffers from entangled attribution, where edges may be falsely reinforced or suppressed. HieraMAS reformulates topology selection as a holistic graph classification task, using a graph generator to produce candidates and a graph classifier to select the optimal topology.
  • Figure 2: The overall framework of HieraMAS. By optimizing a policy learner $\pi_m$ with multi-level rewards (Stage 1) and a graph classifier $f_G(\cdot)$ with contrastive rewards (Stage 2), HieraMAS learns to select optimal supernode configurations and communication topologies. During inference, the trained modules jointly determine the supernode configurations and graph topology, then execute the MAS to produce the final answer.
  • Figure 3: Analysis of learned topologies on MMLU-Redux. (a) Visualization of the top-3 most frequently selected graph structures with their density. (b) Pairwise Jaccard similarity between top-5 graphs, showing low structural overlap.
  • Figure 4: Dataset-level LLM selection preferences learned by HieraMAS. Normalized Logits indicate selection preference, with higher values indicating stronger preference.

Theorems & Definitions (11)

  • Definition 1: Supernode
  • Theorem 3.1
  • Proposition 1: Gradient Bias under Final Reward
  • proof
  • Corollary 3.1: Sufficient Condition for Gradient with Correct Sign
  • proof
  • Proposition 2: Credit Assignment Error in Per-Edge Optimization
  • proof
  • Corollary 3.2: Justification for Holistic Graph Selection
  • Theorem 3.3: Generalization Guarantee for Graph Classifier
  • ...and 1 more