Table of Contents
Fetching ...

Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems

Shangbin Feng, Zifeng Wang, Palash Goyal, Yike Wang, Weijia Shi, Huang Xia, Hamid Palangi, Luke Zettlemoyer, Yulia Tsvetkov, Chen-Yu Lee, Tomas Pfister

TL;DR

This work tackles the challenge of designing effective multi-LLM systems by jointly optimizing their graph-based roles and weights. It introduces Heterogeneous Swarms, which alternates between role-step and weight-step optimizations: role-step learns DAG-structured input-output relations among LLMs via G-Decode and PSO, while weight-step assesses and tunes each LLM's contribution with JFK-score and PSO. The approach achieves state-of-the-art results across 12 tasks, highlighting significant collaborative gains, task-dependent importance of roles versus weights, and the benefits of diversity and dynamic adaptation. The proposed framework enables scalable, task-specific collaboration among heterogeneous LLMs with inference-time scaling and opportunities to reduce costs through sparsity.

Abstract

We propose Heterogeneous Swarms, an algorithm to design multi-LLM systems by jointly optimizing model roles and weights. We represent multi-LLM systems as directed acyclic graphs (DAGs) of LLMs with topological message passing for collaborative generation. Given a pool of LLM experts and a utility function, Heterogeneous Swarms employs two iterative steps: role-step and weight-step. For role-step, we interpret model roles as learning a DAG that specifies the flow of inputs and outputs between LLMs. Starting from a swarm of random continuous adjacency matrices, we decode them into discrete DAGs, call the LLMs in topological order, evaluate on the utility function (e.g. accuracy on a task), and optimize the adjacency matrices with particle swarm optimization based on the utility score. For weight-step, we assess the contribution of individual LLMs in the multi-LLM systems and optimize model weights with swarm intelligence. We propose JFK-score to quantify the individual contribution of each LLM in the best-found DAG of the role-step, then optimize model weights with particle swarm optimization based on the JFK-score. Experiments demonstrate that Heterogeneous Swarms outperforms 15 role- and/or weight-based baselines by 18.5% on average across 12 tasks. Further analysis reveals that Heterogeneous Swarms discovers multi-LLM systems with heterogeneous model roles and substantial collaborative gains, and benefits from the diversity of language models.

Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems

TL;DR

This work tackles the challenge of designing effective multi-LLM systems by jointly optimizing their graph-based roles and weights. It introduces Heterogeneous Swarms, which alternates between role-step and weight-step optimizations: role-step learns DAG-structured input-output relations among LLMs via G-Decode and PSO, while weight-step assesses and tunes each LLM's contribution with JFK-score and PSO. The approach achieves state-of-the-art results across 12 tasks, highlighting significant collaborative gains, task-dependent importance of roles versus weights, and the benefits of diversity and dynamic adaptation. The proposed framework enables scalable, task-specific collaboration among heterogeneous LLMs with inference-time scaling and opportunities to reduce costs through sparsity.

Abstract

We propose Heterogeneous Swarms, an algorithm to design multi-LLM systems by jointly optimizing model roles and weights. We represent multi-LLM systems as directed acyclic graphs (DAGs) of LLMs with topological message passing for collaborative generation. Given a pool of LLM experts and a utility function, Heterogeneous Swarms employs two iterative steps: role-step and weight-step. For role-step, we interpret model roles as learning a DAG that specifies the flow of inputs and outputs between LLMs. Starting from a swarm of random continuous adjacency matrices, we decode them into discrete DAGs, call the LLMs in topological order, evaluate on the utility function (e.g. accuracy on a task), and optimize the adjacency matrices with particle swarm optimization based on the utility score. For weight-step, we assess the contribution of individual LLMs in the multi-LLM systems and optimize model weights with swarm intelligence. We propose JFK-score to quantify the individual contribution of each LLM in the best-found DAG of the role-step, then optimize model weights with particle swarm optimization based on the JFK-score. Experiments demonstrate that Heterogeneous Swarms outperforms 15 role- and/or weight-based baselines by 18.5% on average across 12 tasks. Further analysis reveals that Heterogeneous Swarms discovers multi-LLM systems with heterogeneous model roles and substantial collaborative gains, and benefits from the diversity of language models.

Paper Structure

This paper contains 43 sections, 6 equations, 15 figures, 11 tables, 4 algorithms.

Figures (15)

  • Figure 1: Our objective: given a pool of LLMs and a task utility function $f$, discover a multi-LLM system with graph-based model roles and adapted model weights tailored to $f$.
  • Figure 2: Overview of Heterogeneous Swarms: starting with a swarm of graphs represented by continuous adjacency matrices and a swarm of LLMs, Heterogeneous Swarms rotates between role-step and weight-step. In the role-step, we decode continuous adjacency adjacencies into discrete graphs, call the LLMs in topological order to fulfill a task, evaluate on the utility function, and optimize the adjacency matrices with particle swarm optimization. In the weight-step, models are randomly assigned to positions in the best-found network, evaluated by their individual contribution through the JFK-Score, and then optimized with particle swarm optimization. PSO denotes particle swarm optimization (Sec \ref{['sec:preliminary']}), G-Decode denotes Algorithm 2, and $f$ denotes the utility function.
  • Figure 3: Evaluating collaborative gains: we create problem buckets by how many out of the 10 individual LLMs could solve it correctly. Top row: the problem count as well as whether the multi-LLM correctly solves the problems in each bucket. Bottom row: accuracy of the multi-LLM system in each bucket and expected accuracy denoted by the dotted line. Heterogeneous Swarms achieves collaborative gains ($\mathrm{C - Gain}$) of 0.143, 0.184, 0.101, and 0.426 on the four datasets, all $>0$, and demonstrate consistent collaborative gains.
  • Figure 4: Analyzing the roles in Multi-LLM systems. Top left: the percentage of LLM roles aggregated per dataset. Bottom left: the change of LLM roles in the optimization process for NLGraph. Right: Per-LLM role distribution in the best-found multi-LLM system for NLGraph. Together these figures demonstrate the heterogeneous roles in the multi-LLM systems by Heterogeneous Swarms.
  • Figure 5: Heterogeneous Swarms with increasing levels of diversity in initial LLMs. Results show a general upward trend and an 89% increase on average from the least to most diverse models.
  • ...and 10 more figures