Representing LLMs in Prompt Semantic Task Space
Idan Kashani, Avi Mendelson, Yaniv Nemcovsky
TL;DR
This work tackles the challenge of selecting top-performing LLMs for diverse prompts without retraining, by representing each model as a linear operator in the prompts' semantic space. Prompt embeddings $E(p)$ map queries into a common space, and each model is characterized by a vector $E(M)$ such that the predicted success on a prompt is $ ext{Succ}(M,q) = E(M) \,\cdot\ E(q)$. The model embeddings are obtained in closed form via a regularized pseudoinverse of the source-prompt matrix, enabling training-free, scalable predictions and real-time model selection, with strong performance, including in out-of-sample settings. Empirically, the method matches or surpasses state-of-the-art baselines on success prediction and model selection tasks, while offering negligible compute and easy incremental updates for new models or prompts. The approach provides a semantically grounded, interpretable framework for organizing and retrieving LLM capabilities across expanding model and benchmark repositories, with clear paths for multi-task extensions and broader performance properties.
Abstract
Large language models (LLMs) achieve impressive results over various tasks, and ever-expanding public repositories contain an abundance of pre-trained models. Therefore, identifying the best-performing LLM for a given task is a significant challenge. Previous works have suggested learning LLM representations to address this. However, these approaches present limited scalability and require costly retraining to encompass additional models and datasets. Moreover, the produced representation utilizes distinct spaces that cannot be easily interpreted. This work presents an efficient, training-free approach to representing LLMs as linear operators within the prompts' semantic task space, thus providing a highly interpretable representation of the models' application. Our method utilizes closed-form computation of geometrical properties and ensures exceptional scalability and real-time adaptability to dynamically expanding repositories. We demonstrate our approach on success prediction and model selection tasks, achieving competitive or state-of-the-art results with notable performance in out-of-sample scenarios.
