CCoE: A Compact and Efficient LLM Framework with Multi-Expert Collaboration for Resource-Limited Settings
Shaomang Huang, Jianfeng Pan, Min Peng, Hanzhong Zheng
TL;DR
CCoE tackles the challenge of deploying multiple domain-specific LLMs under resource constraints by unifying independently trained domain experts as subnetworks within a shared backbone. It formalizes a partitioned architecture where $L = l_b + l_i$ and outputs follow $O_i = E_{i,l_i}(E_{b,l_b}(q)) \cdot \mathbf{1}_{\mathcal{R}(q)=i}$, with training guided by per-expert losses $\mathcal{L}_{i} = - \sum_{t=1}^T \log p(y_t|y_{<t}; \theta^*_b; \theta_i)$. Two routing schemes—rule-based gating and expert planning—enable flexible, scalable collaboration among experts, including a planning component that uses scores $h^{(t)}_{j,i}$ to select best-matched experts. Across Math, Code, Law, Medical, and Text-to-SQL, CCoE matches or exceeds domain-specific LLMs while dramatically reducing memory usage and improving inference efficiency relative to multi-domain ensembles and parameter-efficient adapters, making it well suited for resource-limited deployments. The framework supports rapid expansion and knowledge updates through push-and-pop operations, promising practical applicability for real-world, cross-domain reasoning tasks.
Abstract
Large Language Models (LLMs) have achieved exceptional performance across diverse domains through training on massive datasets. However, scaling LLMs to support multiple downstream domain applications remains a significant challenge, especially under resource constraints. Existing approaches often struggle to balance performance across multiple domains with resource efficiency, limiting their broader applicability. To address this, we introduce the CCoE architecture, a modular framework that seamlessly integrates domain-specific experts into a unified LLM. By leveraging independently trained expert subnetworks on a shared backbone partition, CCoE achieves state-of-the-art performance while significantly reducing the resource requirements for multi-expert deployments. Furthermore, rule-based gating and expert planning in CCoE enable flexible task allocation, promoting expert collaboration to handle complex reasoning tasks. CCoE not only reduces inference costs but also provides a flexible and scalable solution for integrating domain expertise across diverse applications. Experiments on five domains demonstrate that CCoE achieves comparable performance to current domain-specific LLMs. Moreover, compared to existing multi-domain model ensemble methods, CCoE reduces memory usage by 61.3%, while improving inference efficiency by 0.76x over parameter-efficient multi-expert integration approaches.
