Coalitions of Large Language Models Increase the Robustness of AI Agents
Prattyush Mangal, Carol Mak, Theo Kanakis, Timothy Donovan, Dave Braines, Edward Pyzer-Knapp
TL;DR
This paper investigates whether a coalition of open-source pretrained LLMs, each specialist for a sub-task in an agentic workflow, can surpass single-model or fine-tuned approaches in tool-use tasks. By decomposing workflows into planning, slot filling, and response formation, the authors assign each sub-task to the model best suited for it, demonstrating improved robustness and cost efficiency. Across ToolAlpaca benchmarks, the coalition outperforms fine-tuned baselines and single-model configurations, with notable per-task specialization advantages (e.g., Mistral for planning, Mixtral for slot filling, Flan UL2 for JSON RAG) and evidence that smaller models can beat larger ones on specific tasks. The work suggests that multi-model coalitions offer practical benefits for deploying cost-effective, flexible AI agents and motivates future exploration of coalitions that combine fine-tuned models for potential further gains.
Abstract
The emergence of Large Language Models (LLMs) have fundamentally altered the way we interact with digital systems and have led to the pursuit of LLM powered AI agents to assist in daily workflows. LLMs, whilst powerful and capable of demonstrating some emergent properties, are not logical reasoners and often struggle to perform well at all sub-tasks carried out by an AI agent to plan and execute a workflow. While existing studies tackle this lack of proficiency by generalised pretraining at a huge scale or by specialised fine-tuning for tool use, we assess if a system comprising of a coalition of pretrained LLMs, each exhibiting specialised performance at individual sub-tasks, can match the performance of single model agents. The coalition of models approach showcases its potential for building robustness and reducing the operational costs of these AI agents by leveraging traits exhibited by specific models. Our findings demonstrate that fine-tuning can be mitigated by considering a coalition of pretrained models and believe that this approach can be applied to other non-agentic systems which utilise LLMs.
