Table of Contents
Fetching ...

MoDEM: Mixture of Domain Expert Models

Toby Simonds, Kemal Kurniawan, Jey Han Lau

TL;DR

MoDEM introduces a Mixture of Domain Expert Models that routes user prompts to domain-specific experts via a compact DeBERTa-v3-large router. The approach achieves state-of-the-art or near-state-of-the-art results across multiple benchmarks while markedly reducing inference costs, especially in math-related and multi-domain tasks. Key contributions include a modular router-expert architecture, a diverse training dataset with synthetic domain data, and a comprehensive cost-efficiency analysis demonstrating superior price-to-performance relative to large general-purpose models. The study argues for a paradigm shift toward ecosystems of smaller, specialized models coupled with intelligent routing to sustain progress in AI under compute constraints.

Abstract

We propose a novel approach to enhancing the performance and efficiency of large language models (LLMs) by combining domain prompt routing with domain-specialized models. We introduce a system that utilizes a BERT-based router to direct incoming prompts to the most appropriate domain expert model. These expert models are specifically tuned for domains such as health, mathematics and science. Our research demonstrates that this approach can significantly outperform general-purpose models of comparable size, leading to a superior performance-to-cost ratio across various benchmarks. The implications of this study suggest a potential paradigm shift in LLM development and deployment. Rather than focusing solely on creating increasingly large, general-purpose models, the future of AI may lie in developing ecosystems of smaller, highly specialized models coupled with sophisticated routing systems. This approach could lead to more efficient resource utilization, reduced computational costs, and superior overall performance.

MoDEM: Mixture of Domain Expert Models

TL;DR

MoDEM introduces a Mixture of Domain Expert Models that routes user prompts to domain-specific experts via a compact DeBERTa-v3-large router. The approach achieves state-of-the-art or near-state-of-the-art results across multiple benchmarks while markedly reducing inference costs, especially in math-related and multi-domain tasks. Key contributions include a modular router-expert architecture, a diverse training dataset with synthetic domain data, and a comprehensive cost-efficiency analysis demonstrating superior price-to-performance relative to large general-purpose models. The study argues for a paradigm shift toward ecosystems of smaller, specialized models coupled with intelligent routing to sustain progress in AI under compute constraints.

Abstract

We propose a novel approach to enhancing the performance and efficiency of large language models (LLMs) by combining domain prompt routing with domain-specialized models. We introduce a system that utilizes a BERT-based router to direct incoming prompts to the most appropriate domain expert model. These expert models are specifically tuned for domains such as health, mathematics and science. Our research demonstrates that this approach can significantly outperform general-purpose models of comparable size, leading to a superior performance-to-cost ratio across various benchmarks. The implications of this study suggest a potential paradigm shift in LLM development and deployment. Rather than focusing solely on creating increasingly large, general-purpose models, the future of AI may lie in developing ecosystems of smaller, highly specialized models coupled with sophisticated routing systems. This approach could lead to more efficient resource utilization, reduced computational costs, and superior overall performance.

Paper Structure

This paper contains 22 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: MoDEM architecture diagram