Table of Contents
Fetching ...

MALBO: Optimizing LLM-Based Multi-Agent Teams via Multi-Objective Bayesian Optimization

Antonio Sabbatella

TL;DR

MALBO tackles the challenging problem of optimally composing LLM-based agent teams by framing it as a multi-objective black-box optimization over task accuracy and inference cost. The method relaxes discrete model assignments into a continuous feature space, then uses independent Gaussian Process surrogates and the qLogEHVI acquisition to efficiently approximate the Pareto front, mapping ideal solutions back to deployable LLM configurations. Empirical results on the GAIA benchmark show that MALBO can achieve substantial cost reductions (up to ~66%) with comparable average performance, and identify specialized, heterogeneous teams that outperform homogeneous baselines on cost-efficiency. The work provides actionable, data-driven guidance for deploying cost-effective, highly specialized multi-agent AI systems and lays groundwork for extending to additional objectives and information sources in future research.

Abstract

The optimal assignment of Large Language Models (LLMs) to specialized roles in multi-agent systems is a significant challenge, defined by a vast combinatorial search space, expensive black-box evaluations, and an inherent trade-off between performance and cost. Current optimization methods focus on single-agent settings and lack a principled framework for this multi-agent, multi-objective problem. This thesis introduces MALBO (Multi-Agent LLM Bayesian Optimization), a systematic framework designed to automate the efficient composition of LLM-based agent teams. We formalize the assignment challenge as a multi-objective optimization problem, aiming to identify the Pareto front of configurations between task accuracy and inference cost. The methodology employs multi-objective Bayesian Optimization (MOBO) with independent Gaussian Process surrogate models. By searching over a continuous feature-space representation of the LLMs, this approach performs a sample-efficient exploration guided by the expected hypervolume improvement. The primary contribution is a principled and automated methodology that yields a Pareto front of optimal team configurations. Our results demonstrate that the Bayesian optimization phase, compared to an initial random search, maintained a comparable average performance while reducing the average configuration cost by over 45%. Furthermore, MALBO identified specialized, heterogeneous teams that achieve cost reductions of up to 65.8% compared to homogeneous baselines, all while maintaining maximum performance. The framework thus provides a data-driven tool for deploying cost-effective and highly specialized multi-agent AI systems.

MALBO: Optimizing LLM-Based Multi-Agent Teams via Multi-Objective Bayesian Optimization

TL;DR

MALBO tackles the challenging problem of optimally composing LLM-based agent teams by framing it as a multi-objective black-box optimization over task accuracy and inference cost. The method relaxes discrete model assignments into a continuous feature space, then uses independent Gaussian Process surrogates and the qLogEHVI acquisition to efficiently approximate the Pareto front, mapping ideal solutions back to deployable LLM configurations. Empirical results on the GAIA benchmark show that MALBO can achieve substantial cost reductions (up to ~66%) with comparable average performance, and identify specialized, heterogeneous teams that outperform homogeneous baselines on cost-efficiency. The work provides actionable, data-driven guidance for deploying cost-effective, highly specialized multi-agent AI systems and lays groundwork for extending to additional objectives and information sources in future research.

Abstract

The optimal assignment of Large Language Models (LLMs) to specialized roles in multi-agent systems is a significant challenge, defined by a vast combinatorial search space, expensive black-box evaluations, and an inherent trade-off between performance and cost. Current optimization methods focus on single-agent settings and lack a principled framework for this multi-agent, multi-objective problem. This thesis introduces MALBO (Multi-Agent LLM Bayesian Optimization), a systematic framework designed to automate the efficient composition of LLM-based agent teams. We formalize the assignment challenge as a multi-objective optimization problem, aiming to identify the Pareto front of configurations between task accuracy and inference cost. The methodology employs multi-objective Bayesian Optimization (MOBO) with independent Gaussian Process surrogate models. By searching over a continuous feature-space representation of the LLMs, this approach performs a sample-efficient exploration guided by the expected hypervolume improvement. The primary contribution is a principled and automated methodology that yields a Pareto front of optimal team configurations. Our results demonstrate that the Bayesian optimization phase, compared to an initial random search, maintained a comparable average performance while reducing the average configuration cost by over 45%. Furthermore, MALBO identified specialized, heterogeneous teams that achieve cost reductions of up to 65.8% compared to homogeneous baselines, all while maintaining maximum performance. The framework thus provides a data-driven tool for deploying cost-effective and highly specialized multi-agent AI systems.

Paper Structure

This paper contains 118 sections, 12 equations, 30 figures, 5 tables, 1 algorithm.

Figures (30)

  • Figure 1: Overview of the multi-agent LLM system architecture. The Manager orchestrates interactions among specialized agents, the Search Agent, Visual QA, and Text Inspector, each equipped with specific toolsets for web browsing, image analysis, and file analysis, respectively. Both the Manager and the Search Agent operate in iterative loops: the Manager performs re-planning every 2 steps, while the Search Agent refines its search every 4 steps. The Manager aggregates all agent outputs through an information synthesis phase before passing them to the Reformulator, which produces the final formatted response. The MALBO optimization process dynamically assigns LLM configurations to balance cost and performance across key agents. In the figure, agents are represented by a robot icon at the top left, and those optimized by MALBO are marked with a red target icon at the top right.
  • Figure 2: The Transformer architecture by vaswani2017attention, illustrating the transformer architecture, including embedding, positional encoding, self-attention mechanism, and Feed-Forward Networks.
  • Figure 3: The Scaled Dot-Product Attention mechanism. The dot product of queries and keys is scaled and passed through a softmax function to obtain weights for the value vectors.
  • Figure 4: The architecture of a decoder-only Transformer, which forms the basis of most modern LLMs. The model consists of a stack of decoder layers, each containing a masked multi-head self-attention module and a feed-forward network. The cross-attention module is omitted. In the diagram, $L$ represents the number of stacked decoder layers, $W_1$ and $W_2$ are the weight matrices of the position-wise feed-forward network, while $Q_m$, $K_m$, $V_m$, and $O_m$ denote the query, key, value, and output projection matrices of the multi-head attention mechanism, respectively. Figure adapted from ji2025overview.
  • Figure 5: Overview of the training pipeline for instruction-following language models, adapted from Ouyang et al. (2022) ouyang2022training. The process consists of three main stages: (1) Pre-Training, where a decoder-only Transformer is trained on large unlabeled text corpora; (2) Supervised Fine-Tuning (SFT), where human labelers provide examples of desired model behavior; and (3) Reinforcement Learning from Human Feedback (RLHF), where human preferences are used to train a reward model and iteratively improve the policy via reinforcement learning.
  • ...and 25 more figures