Table of Contents
Fetching ...

Efficient Dynamic Ensembling for Multiple LLM Experts

Jinwu Hu, Yufeng Wang, Shuhai Zhang, Kai Zhou, Guohao Chen, Yu Hu, Bin Xiao, Mingkui Tan

TL;DR

This work addresses the challenge of efficiently combining multiple, non-homologous LLM experts by modeling ensemble reasoning as a Markov Decision Process and training a DER-Agent to dynamically route questions through a sequence of LLMs. A Knowledge Transfer Prompt enables successive models to leverage prior outputs, while a carefully designed reward balances answer quality and computational cost, optimized via Proximal Policy Optimization. Empirical results on MixInstruct, GSM8K, and multi-domain tasks show DER achieves competitive or superior performance with substantially fewer inference parameters than running all experts, validating the practicality of sequential, knowledge-transfer-driven ensembling. The approach offers a scalable path to harness diverse LLM strengths in real-world settings with constrained compute.

Abstract

LLMs have demonstrated impressive performance across various language tasks. However, the strengths of LLMs can vary due to different architectures, model sizes, areas of training data, etc. Therefore, ensemble reasoning for the strengths of different LLM experts is critical to achieving consistent and satisfactory performance on diverse inputs across a wide range of tasks. However, existing LLM ensemble methods are either computationally intensive or incapable of leveraging complementary knowledge among LLM experts for various inputs. In this paper, we propose an efficient Dynamic Ensemble Reasoning paradigm, called DER to integrate the strengths of multiple LLM experts conditioned on dynamic inputs. Specifically, we model the LLM ensemble reasoning problem as a Markov Decision Process, wherein an agent sequentially takes inputs to request knowledge from an LLM candidate and passes the output to a subsequent LLM candidate. Moreover, we devise a reward function to train a DER-Agent to dynamically select an optimal answering route given the input questions, aiming to achieve the highest performance with as few computational resources as possible. Last, to fully transfer the expert knowledge from the prior LLMs, we develop a Knowledge Transfer Prompt that enables the subsequent LLM candidates to transfer complementary knowledge effectively. Experiments demonstrate that our method uses fewer computational resources to achieve better performance compared to state-of-the-art baselines. Code and appendix are available at https://github.com/Fhujinwu/DER

Efficient Dynamic Ensembling for Multiple LLM Experts

TL;DR

This work addresses the challenge of efficiently combining multiple, non-homologous LLM experts by modeling ensemble reasoning as a Markov Decision Process and training a DER-Agent to dynamically route questions through a sequence of LLMs. A Knowledge Transfer Prompt enables successive models to leverage prior outputs, while a carefully designed reward balances answer quality and computational cost, optimized via Proximal Policy Optimization. Empirical results on MixInstruct, GSM8K, and multi-domain tasks show DER achieves competitive or superior performance with substantially fewer inference parameters than running all experts, validating the practicality of sequential, knowledge-transfer-driven ensembling. The approach offers a scalable path to harness diverse LLM strengths in real-world settings with constrained compute.

Abstract

LLMs have demonstrated impressive performance across various language tasks. However, the strengths of LLMs can vary due to different architectures, model sizes, areas of training data, etc. Therefore, ensemble reasoning for the strengths of different LLM experts is critical to achieving consistent and satisfactory performance on diverse inputs across a wide range of tasks. However, existing LLM ensemble methods are either computationally intensive or incapable of leveraging complementary knowledge among LLM experts for various inputs. In this paper, we propose an efficient Dynamic Ensemble Reasoning paradigm, called DER to integrate the strengths of multiple LLM experts conditioned on dynamic inputs. Specifically, we model the LLM ensemble reasoning problem as a Markov Decision Process, wherein an agent sequentially takes inputs to request knowledge from an LLM candidate and passes the output to a subsequent LLM candidate. Moreover, we devise a reward function to train a DER-Agent to dynamically select an optimal answering route given the input questions, aiming to achieve the highest performance with as few computational resources as possible. Last, to fully transfer the expert knowledge from the prior LLMs, we develop a Knowledge Transfer Prompt that enables the subsequent LLM candidates to transfer complementary knowledge effectively. Experiments demonstrate that our method uses fewer computational resources to achieve better performance compared to state-of-the-art baselines. Code and appendix are available at https://github.com/Fhujinwu/DER

Paper Structure

This paper contains 13 sections, 6 equations, 2 figures, 7 tables, 1 algorithm.

Figures (2)

  • Figure 1: Illustration of different LLM ensemble strategies. (a) Ensemble with MoEs. (b) Ensemble with the agent.
  • Figure 2: General diagram of DER. We formulate the LLM ensemble as an MDP and train DER-Agent to select an optimal answering route for inputs. At step $t$, the DER-Agent takes ${\color[RGB]{197, 90, 17}s_t}=[Q: x, A: {\color[RGB]{197, 90, 17}\hat{y}_{t-1}}]$ as input and selects an LLM $\color[RGB]{197, 90, 17}\mathcal{M}_{a_t}$ to continue answering the question regarding the existing answer, leading to a new answer $\color[RGB]{68,114,196} \hat{y}_{t}$. We calculate a reward ${\color[RGB]{197, 90, 17}r_t}$ with $\color[RGB]{68,114,196} \hat{y}_{t}$ and update the state to $\color[RGB]{68,114,196}s_{t+1}$. This process will loop until the answer is evaluated as satisfactory enough or the max trajectory length.