Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems

Shangbin Feng; Zifeng Wang; Palash Goyal; Yike Wang; Weijia Shi; Huang Xia; Hamid Palangi; Luke Zettlemoyer; Yulia Tsvetkov; Chen-Yu Lee; Tomas Pfister

Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems

Shangbin Feng, Zifeng Wang, Palash Goyal, Yike Wang, Weijia Shi, Huang Xia, Hamid Palangi, Luke Zettlemoyer, Yulia Tsvetkov, Chen-Yu Lee, Tomas Pfister

TL;DR

This work tackles the challenge of designing effective multi-LLM systems by jointly optimizing their graph-based roles and weights. It introduces Heterogeneous Swarms, which alternates between role-step and weight-step optimizations: role-step learns DAG-structured input-output relations among LLMs via G-Decode and PSO, while weight-step assesses and tunes each LLM's contribution with JFK-score and PSO. The approach achieves state-of-the-art results across 12 tasks, highlighting significant collaborative gains, task-dependent importance of roles versus weights, and the benefits of diversity and dynamic adaptation. The proposed framework enables scalable, task-specific collaboration among heterogeneous LLMs with inference-time scaling and opportunities to reduce costs through sparsity.

Abstract

We propose Heterogeneous Swarms, an algorithm to design multi-LLM systems by jointly optimizing model roles and weights. We represent multi-LLM systems as directed acyclic graphs (DAGs) of LLMs with topological message passing for collaborative generation. Given a pool of LLM experts and a utility function, Heterogeneous Swarms employs two iterative steps: role-step and weight-step. For role-step, we interpret model roles as learning a DAG that specifies the flow of inputs and outputs between LLMs. Starting from a swarm of random continuous adjacency matrices, we decode them into discrete DAGs, call the LLMs in topological order, evaluate on the utility function (e.g. accuracy on a task), and optimize the adjacency matrices with particle swarm optimization based on the utility score. For weight-step, we assess the contribution of individual LLMs in the multi-LLM systems and optimize model weights with swarm intelligence. We propose JFK-score to quantify the individual contribution of each LLM in the best-found DAG of the role-step, then optimize model weights with particle swarm optimization based on the JFK-score. Experiments demonstrate that Heterogeneous Swarms outperforms 15 role- and/or weight-based baselines by 18.5% on average across 12 tasks. Further analysis reveals that Heterogeneous Swarms discovers multi-LLM systems with heterogeneous model roles and substantial collaborative gains, and benefits from the diversity of language models.

Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems

TL;DR

Abstract

Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)