MALBO: Optimizing LLM-Based Multi-Agent Teams via Multi-Objective Bayesian Optimization
Antonio Sabbatella
TL;DR
MALBO tackles the challenging problem of optimally composing LLM-based agent teams by framing it as a multi-objective black-box optimization over task accuracy and inference cost. The method relaxes discrete model assignments into a continuous feature space, then uses independent Gaussian Process surrogates and the qLogEHVI acquisition to efficiently approximate the Pareto front, mapping ideal solutions back to deployable LLM configurations. Empirical results on the GAIA benchmark show that MALBO can achieve substantial cost reductions (up to ~66%) with comparable average performance, and identify specialized, heterogeneous teams that outperform homogeneous baselines on cost-efficiency. The work provides actionable, data-driven guidance for deploying cost-effective, highly specialized multi-agent AI systems and lays groundwork for extending to additional objectives and information sources in future research.
Abstract
The optimal assignment of Large Language Models (LLMs) to specialized roles in multi-agent systems is a significant challenge, defined by a vast combinatorial search space, expensive black-box evaluations, and an inherent trade-off between performance and cost. Current optimization methods focus on single-agent settings and lack a principled framework for this multi-agent, multi-objective problem. This thesis introduces MALBO (Multi-Agent LLM Bayesian Optimization), a systematic framework designed to automate the efficient composition of LLM-based agent teams. We formalize the assignment challenge as a multi-objective optimization problem, aiming to identify the Pareto front of configurations between task accuracy and inference cost. The methodology employs multi-objective Bayesian Optimization (MOBO) with independent Gaussian Process surrogate models. By searching over a continuous feature-space representation of the LLMs, this approach performs a sample-efficient exploration guided by the expected hypervolume improvement. The primary contribution is a principled and automated methodology that yields a Pareto front of optimal team configurations. Our results demonstrate that the Bayesian optimization phase, compared to an initial random search, maintained a comparable average performance while reducing the average configuration cost by over 45%. Furthermore, MALBO identified specialized, heterogeneous teams that achieve cost reductions of up to 65.8% compared to homogeneous baselines, all while maintaining maximum performance. The framework thus provides a data-driven tool for deploying cost-effective and highly specialized multi-agent AI systems.
