Table of Contents
Fetching ...

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, Jingjing Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang

TL;DR

This work introduces Expert-Token-Routing (ETR), a unified generalist framework that seamlessly incorporates multiple expert LLMs by encoding each expert as a special token in a frozen meta LLM. Routing to an expert is performed as part of standard next-token prediction, with expert token embeddings learned from an expert query set so that the meta LLM learns when to delegate to the right specialist. Training updates only the expert token embeddings, enabling plug-in extension of new experts without retraining the backbone model. Across six expert domains on the MMLU-Expert benchmark, ETR achieves higher overall accuracy and expert routing accuracy than prompting-based and router-based baselines, while maintaining user-facing simplicity and showing robustness to dynamic extension with minimal performance loss. The results demonstrate the practicality of building scalable, real-time multi-expert systems by unifying expert knowledge under a single, generalist interface.

Abstract

We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

TL;DR

This work introduces Expert-Token-Routing (ETR), a unified generalist framework that seamlessly incorporates multiple expert LLMs by encoding each expert as a special token in a frozen meta LLM. Routing to an expert is performed as part of standard next-token prediction, with expert token embeddings learned from an expert query set so that the meta LLM learns when to delegate to the right specialist. Training updates only the expert token embeddings, enabling plug-in extension of new experts without retraining the backbone model. Across six expert domains on the MMLU-Expert benchmark, ETR achieves higher overall accuracy and expert routing accuracy than prompting-based and router-based baselines, while maintaining user-facing simplicity and showing robustness to dynamic extension with minimal performance loss. The results demonstrate the practicality of building scalable, real-time multi-expert systems by unifying expert knowledge under a single, generalist interface.

Abstract

We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.
Paper Structure (28 sections, 4 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 28 sections, 4 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: The framework of ETR. During the ETR's decoding process, the meta LLM serves as the default active LLM. When predicting the expert token, the active LLM switches to the corresponding expert LLM. The Expert tokens are appended to the frozen language modeling head of the meta LLM, where it is treated equally with word tokens during the next token prediction.
  • Figure 2: Expert LLMs' accuracy on MMLU-Expert.
  • Figure 3: Collection process of multi-domain knowledge dataset synthesized through GPT-4. To prevent potential data leakage, synthetic questions with high BERTScore bert-score to any question in the test set are filtered out.
  • Figure 4: Routing expert distribution of Meta-Prompting-E (top) and ETR (down).
  • Figure 5: Overall Accuracy (Left) and Expert Routing Accuracy (Right) on MMLU-Expert as the size of the expert query set per expert increases.
  • ...and 2 more figures