Table of Contents
Fetching ...

Representing LLMs in Prompt Semantic Task Space

Idan Kashani, Avi Mendelson, Yaniv Nemcovsky

TL;DR

This work tackles the challenge of selecting top-performing LLMs for diverse prompts without retraining, by representing each model as a linear operator in the prompts' semantic space. Prompt embeddings $E(p)$ map queries into a common space, and each model is characterized by a vector $E(M)$ such that the predicted success on a prompt is $ ext{Succ}(M,q) = E(M) \,\cdot\ E(q)$. The model embeddings are obtained in closed form via a regularized pseudoinverse of the source-prompt matrix, enabling training-free, scalable predictions and real-time model selection, with strong performance, including in out-of-sample settings. Empirically, the method matches or surpasses state-of-the-art baselines on success prediction and model selection tasks, while offering negligible compute and easy incremental updates for new models or prompts. The approach provides a semantically grounded, interpretable framework for organizing and retrieving LLM capabilities across expanding model and benchmark repositories, with clear paths for multi-task extensions and broader performance properties.

Abstract

Large language models (LLMs) achieve impressive results over various tasks, and ever-expanding public repositories contain an abundance of pre-trained models. Therefore, identifying the best-performing LLM for a given task is a significant challenge. Previous works have suggested learning LLM representations to address this. However, these approaches present limited scalability and require costly retraining to encompass additional models and datasets. Moreover, the produced representation utilizes distinct spaces that cannot be easily interpreted. This work presents an efficient, training-free approach to representing LLMs as linear operators within the prompts' semantic task space, thus providing a highly interpretable representation of the models' application. Our method utilizes closed-form computation of geometrical properties and ensures exceptional scalability and real-time adaptability to dynamically expanding repositories. We demonstrate our approach on success prediction and model selection tasks, achieving competitive or state-of-the-art results with notable performance in out-of-sample scenarios.

Representing LLMs in Prompt Semantic Task Space

TL;DR

This work tackles the challenge of selecting top-performing LLMs for diverse prompts without retraining, by representing each model as a linear operator in the prompts' semantic space. Prompt embeddings map queries into a common space, and each model is characterized by a vector such that the predicted success on a prompt is . The model embeddings are obtained in closed form via a regularized pseudoinverse of the source-prompt matrix, enabling training-free, scalable predictions and real-time model selection, with strong performance, including in out-of-sample settings. Empirically, the method matches or surpasses state-of-the-art baselines on success prediction and model selection tasks, while offering negligible compute and easy incremental updates for new models or prompts. The approach provides a semantically grounded, interpretable framework for organizing and retrieving LLM capabilities across expanding model and benchmark repositories, with clear paths for multi-task extensions and broader performance properties.

Abstract

Large language models (LLMs) achieve impressive results over various tasks, and ever-expanding public repositories contain an abundance of pre-trained models. Therefore, identifying the best-performing LLM for a given task is a significant challenge. Previous works have suggested learning LLM representations to address this. However, these approaches present limited scalability and require costly retraining to encompass additional models and datasets. Moreover, the produced representation utilizes distinct spaces that cannot be easily interpreted. This work presents an efficient, training-free approach to representing LLMs as linear operators within the prompts' semantic task space, thus providing a highly interpretable representation of the models' application. Our method utilizes closed-form computation of geometrical properties and ensures exceptional scalability and real-time adaptability to dynamically expanding repositories. We demonstrate our approach on success prediction and model selection tasks, achieving competitive or state-of-the-art results with notable performance in out-of-sample scenarios.

Paper Structure

This paper contains 31 sections, 5 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: The projection of a prompt embedding $E(p)$ on a model embedding $\mathbf{E(M)}_i$ yields a score predicting the model's success on that prompt.
  • Figure 2: Model embeddings creation time vs. number of prompt samples (left) and models (right), on CPU and GPU (logarithmic scale).
  • Figure 3: Success Prediction ROC curves describe the true positive rate vs. the false positive rate across thresholds.
  • Figure 4: A per-benchmark breakdown of Success Prediction (Accuracy and AUC) and Model Selection (Recall) in the EmbedLLM OOS environment for embedding dimensions 384 (top) and 768 (bottom). Our training-free method delivers performance competitive with the EmbedLLM baseline at a fraction of the computational cost.
  • Figure 5: The effect of $\varepsilon$ on the performance metrics discussed in our work.