Table of Contents
Fetching ...

Beyond Monoliths: Expert Orchestration for More Capable, Democratic, and Safe Language Models

Philip Quirke, Narmeen Oozeer, Chaithanya Bandi, Amir Abdullah, Jason Hoelscher-Obermaier, Jeff M. Phillips, Joshua Greaves, Clement Neo, Michael Lan, Fazl Barez, Shriyash Upadhyay

TL;DR

The paper argues that the current emphasis on ever larger generalist LLMs concentrates power and poses safety and governance risks. It proposes Expert Orchestration (EO), a framework of independent Judges and Routers that selects among specialized and general systems for each user query, aiming to improve performance, transparency, and safety while enabling democratic participation in AI development. By leveraging domain-specific experts and modular evaluation, EO promises higher-quality answers at lower average costs, and it supports edge-device deployment alongside cloud-based resources. If successful, EO could accelerate robust, human-aligned AI and broaden participation in AI innovation beyond a few dominant players.

Abstract

This position paper argues that the prevailing trajectory toward ever larger, more expensive generalist foundation models controlled by a handful of companies limits innovation and constrains progress. We challenge this approach by advocating for an "Expert Orchestration" (EO) framework as a superior alternative that democratizes LLM advancement. Our proposed framework intelligently selects from many existing models based on query requirements and decomposition, focusing on identifying what models do well rather than how they work internally. Independent "judge" models assess various models' capabilities across dimensions that matter to users, while "router" systems direct queries to the most appropriate specialists within an approved set. This approach delivers superior performance by leveraging targeted expertise rather than forcing costly generalist models to address all user requirements. EO enhances transparency, control, alignment, performance, safety and democratic participation through intelligent model selection.

Beyond Monoliths: Expert Orchestration for More Capable, Democratic, and Safe Language Models

TL;DR

The paper argues that the current emphasis on ever larger generalist LLMs concentrates power and poses safety and governance risks. It proposes Expert Orchestration (EO), a framework of independent Judges and Routers that selects among specialized and general systems for each user query, aiming to improve performance, transparency, and safety while enabling democratic participation in AI development. By leveraging domain-specific experts and modular evaluation, EO promises higher-quality answers at lower average costs, and it supports edge-device deployment alongside cloud-based resources. If successful, EO could accelerate robust, human-aligned AI and broaden participation in AI innovation beyond a few dominant players.

Abstract

This position paper argues that the prevailing trajectory toward ever larger, more expensive generalist foundation models controlled by a handful of companies limits innovation and constrains progress. We challenge this approach by advocating for an "Expert Orchestration" (EO) framework as a superior alternative that democratizes LLM advancement. Our proposed framework intelligently selects from many existing models based on query requirements and decomposition, focusing on identifying what models do well rather than how they work internally. Independent "judge" models assess various models' capabilities across dimensions that matter to users, while "router" systems direct queries to the most appropriate specialists within an approved set. This approach delivers superior performance by leveraging targeted expertise rather than forcing costly generalist models to address all user requirements. EO enhances transparency, control, alignment, performance, safety and democratic participation through intelligent model selection.

Paper Structure

This paper contains 12 sections, 2 figures.

Figures (2)

  • Figure 1: Expert Orchestration framework: The router leverages cached model capability analyses to categorize incoming prompts and dynamically route queries to the optimal model — whether Transformer-based or alternative architectures, large or small, specialist or generalist, cloud-based or edge-deployed — maximizing response quality while minimizing cost.
  • Figure 2: An experimental "meta-model" (red line) up-and-to-the-left uses judges and routers to combine many models. It out-performs any single model (yellow points). More research is expected to move the Quality / Cost pareto curve "up and to the left" (green arrows).