Table of Contents
Fetching ...

EmbedLLM: Learning Compact Representations of Large Language Models

Richard Zhuang, Tianhao Wu, Zhaojin Wen, Andrew Li, Jiantao Jiao, Kannan Ramchandran

TL;DR

As the diversity of large language models grows, EmbedLLM introduces a unified, compact embedding framework learned via a reconstruction objective to capture salient model characteristics. It leverages an encoder-decoder architecture and a Matrix Factorization-like training objective to enable downstream tasks such as correctness forecasting, model routing, and benchmark accuracy prediction with minimal retraining. Empirical results show the embeddings outperform baselines in routing, offer fast, low-cost routing, and significantly predict benchmark performance without extra inferences while revealing intrinsic model and benchmark information. The approach is validated on 112 open-source models and 36k questions, and the dataset and code are open-sourced to facilitate further research.

Abstract

With hundreds of thousands of language models available on Huggingface today, efficiently evaluating and utilizing these models across various downstream, tasks has become increasingly critical. Many existing methods repeatedly learn task-specific representations of Large Language Models (LLMs), which leads to inefficiencies in both time and computational resources. To address this, we propose EmbedLLM, a framework designed to learn compact vector representations, of LLMs that facilitate downstream applications involving many models, such as model routing. We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness. Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency. Additionally, we demonstrate that our method can forecast a model's performance on multiple benchmarks, without incurring additional inference cost. Extensive probing experiments validate that the learned embeddings capture key model characteristics, e.g. whether the model is specialized for coding tasks, even without being explicitly trained on them. We open source our dataset, code and embedder to facilitate further research and application.

EmbedLLM: Learning Compact Representations of Large Language Models

TL;DR

As the diversity of large language models grows, EmbedLLM introduces a unified, compact embedding framework learned via a reconstruction objective to capture salient model characteristics. It leverages an encoder-decoder architecture and a Matrix Factorization-like training objective to enable downstream tasks such as correctness forecasting, model routing, and benchmark accuracy prediction with minimal retraining. Empirical results show the embeddings outperform baselines in routing, offer fast, low-cost routing, and significantly predict benchmark performance without extra inferences while revealing intrinsic model and benchmark information. The approach is validated on 112 open-source models and 36k questions, and the dataset and code are open-sourced to facilitate further research.

Abstract

With hundreds of thousands of language models available on Huggingface today, efficiently evaluating and utilizing these models across various downstream, tasks has become increasingly critical. Many existing methods repeatedly learn task-specific representations of Large Language Models (LLMs), which leads to inefficiencies in both time and computational resources. To address this, we propose EmbedLLM, a framework designed to learn compact vector representations, of LLMs that facilitate downstream applications involving many models, such as model routing. We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness. Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency. Additionally, we demonstrate that our method can forecast a model's performance on multiple benchmarks, without incurring additional inference cost. Extensive probing experiments validate that the learned embeddings capture key model characteristics, e.g. whether the model is specialized for coding tasks, even without being explicitly trained on them. We open source our dataset, code and embedder to facilitate further research and application.
Paper Structure (20 sections, 2 equations, 6 figures, 2 tables)

This paper contains 20 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An illustration of the EmbedLLM Pipeline. An embedder network is pretrained to convert models into vector embeddings. Downstream applications like model routing are adapted by training an additional linear layer on top of these embeddings.
  • Figure 2: An illustration of the traditional workflow of LLM benchmarking.
  • Figure 3: An illustration of the traditional workflow of model routing, using exemplar routing methodologies from ong2024routellm
  • Figure 4: Performance accuracy of MF router compared to baselines. MF router performs better across the whole test set and achieves accuracies close to the single-best model on every benchmark.
  • Figure 5: Left: Sorted Kendall's Tau test result of accuracy prediction on the benchmarks. The "Significance" column represents the number of times with significant correlation detected (at a 5% significance level) out of 100 random model splits. Right: An example comparing actual model accuracies on MathQA against model accuracies on MathQA predicted from the embeddings trained without MathQA data.
  • ...and 1 more figures