Table of Contents
Fetching ...

RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models

Shuhao Chen, Weisen Jiang, Baijiong Lin, James T. Kwok, Yu Zhang

TL;DR

The paper tackles how to efficiently assemble multiple LLMs by learning a query-based router. It proposes RouterDC, a lightweight encoder plus LLM-embedding framework trained with two contrastive losses (sample-LLM and sample-sample) to favor top-performing LLMs while stabilizing training through query clustering. Empirical results show state-of-the-art performance on multiple reasoning and code tasks in both in-distribution and out-of-distribution settings, with significant accuracy gains and substantially faster inference than voting-based ensembling. The approach is parameter-efficient and practical for real-world deployment, with potential extensions to interactive chat scenarios.

Abstract

Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method called query-based Router by Dual Contrastive learning (RouterDC). The RouterDC model consists of an encoder and LLM embeddings, and we propose two contrastive learning losses to train the RouterDC model. Experimental results show that RouterDC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution (+2.76\%) and out-of-distribution (+1.90\%) tasks. Source code is available at https://github.com/shuhao02/RouterDC.

RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models

TL;DR

The paper tackles how to efficiently assemble multiple LLMs by learning a query-based router. It proposes RouterDC, a lightweight encoder plus LLM-embedding framework trained with two contrastive losses (sample-LLM and sample-sample) to favor top-performing LLMs while stabilizing training through query clustering. Empirical results show state-of-the-art performance on multiple reasoning and code tasks in both in-distribution and out-of-distribution settings, with significant accuracy gains and substantially faster inference than voting-based ensembling. The approach is parameter-efficient and practical for real-world deployment, with potential extensions to interactive chat scenarios.

Abstract

Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method called query-based Router by Dual Contrastive learning (RouterDC). The RouterDC model consists of an encoder and LLM embeddings, and we propose two contrastive learning losses to train the RouterDC model. Experimental results show that RouterDC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution (+2.76\%) and out-of-distribution (+1.90\%) tasks. Source code is available at https://github.com/shuhao02/RouterDC.
Paper Structure (24 sections, 6 equations, 16 figures, 13 tables, 1 algorithm)

This paper contains 24 sections, 6 equations, 16 figures, 13 tables, 1 algorithm.

Figures (16)

  • Figure 1: The inference pipeline of RouterDC. The encoder $\mathcal{E}$ and the LLM embeddings ${\bf k}$'s are trainable parameters, while the LLMs are frozen.
  • Figure 2: Testing accuracy of candidate LLMs and our RouterDC on in-distribution and out-of-distribution tasks.
  • Figure 3: Score distributions of LLMs on an example query (w/ or w/o normalization).
  • Figure 4: Distribution of the score difference between the top two LLMs.
  • Figure 5: Effects of $\lambda$.
  • ...and 11 more figures