Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

Zhijun Chen; Jingzheng Li; Pengpeng Chen; Zhuoran Li; Kai Sun; Yuankai Luo; Qianren Mao; Ming Li; Likang Xiao; Dingqi Yang; Yikun Ban; Hailong Sun; Philip S. Yu

Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

Zhijun Chen, Jingzheng Li, Pengpeng Chen, Zhuoran Li, Kai Sun, Yuankai Luo, Qianren Mao, Ming Li, Likang Xiao, Dingqi Yang, Yikun Ban, Hailong Sun, Philip S. Yu

TL;DR

<3-5 sentence high-level summary> This survey addresses how to leverage multiple large language models through structured ensemble strategies, categorizing methods into before-, during-, and after-inference paradigms and detailing subtypes such as pretrained routers, token/span/process-level fusion, and non-cascade versus cascade approaches. It synthesizes a broad set of techniques, benchmarks (e.g., MixInstruct, RouterBench), and applications, offering a unified taxonomy and critical discussion of trade-offs between performance and cost. The authors identify key limitations—such as coarse span segmentation and the need for principled unsupervised cascade methods—and outline concrete directions to advance the field, including principled segmentation, unsupervised non-cascade after-inference methods, and general cascade frameworks. Overall, the paper provides a comprehensive roadmap for researchers to design, evaluate, and apply LLM ensemble techniques in real-world settings.

Abstract

LLM Ensemble -- which involves the comprehensive use of multiple large language models (LLMs), each aimed at handling user queries during downstream inference, to benefit from their individual strengths -- has gained substantial attention recently. The widespread availability of LLMs, coupled with their varying strengths and out-of-the-box usability, has profoundly advanced the field of LLM Ensemble. This paper presents the first systematic review of recent developments in LLM Ensemble. First, we introduce our taxonomy of LLM Ensemble and discuss several related research problems. Then, we provide a more in-depth classification of the methods under the broad categories of "ensemble-before-inference, ensemble-during-inference, ensemble-after-inference'', and review all relevant methods. Finally, we introduce related benchmarks and applications, summarize existing studies, and suggest several future research directions. A curated list of papers on LLM Ensemble is available at https://github.com/junchenzhi/Awesome-LLM-Ensemble.

Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

TL;DR

Abstract

Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)