Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond
Xinyu Wang, Hainiu Xu, Lin Gui, Yulan He
TL;DR
This work introduces FUTE, a framework for learning Unified Task Embeddings that align datasets and models from diverse architectures, including prompt-based LLMs, into a single vector space. It decouples task information into Dataset Task Embedding $e_D$ and Model Task Embedding $e_M$, using a fixed surrogate base model $\Theta_T$ and unsupervised data $U$ to enable cross-model comparisons without relying on the original training data. The methodology extends two prior task-embedding approaches (TaskEmb, TuPaTE) to a model-agnostic setting and demonstrates competitive performance on both small-language-model and LLM experiments, including zero-shot prompt selection scenarios. Key contributions include (i) formalizing separate DTE and MTE learning, (ii) enabling MTE for LLMs via prompts treated as unified units, (iii) leveraging CrossFit as a dataset-agnostic unsupervised source, and (iv) showing cross-domain transferability and efficiency advantages. This framework broadens the applicability and comparative analysis of task embeddings across heterogeneous models, with potential impact on multi-model adaptation, prompt engineering, and model interpretability.
Abstract
Task embedding, a meta-learning technique that captures task-specific information, has gained popularity, especially in areas such as multi-task learning, model editing, and interpretability. However, it faces challenges with the emergence of prompt-guided Large Language Models (LLMs) operating in a gradient-free manner. Existing task embedding methods rely on fine-tuned, task-specific language models, which hinders the adaptability of task embeddings across diverse models, especially prompt-based LLMs. To hardness the potential of task embeddings in the era of LLMs, we propose a framework for unified task embeddings (FUTE), harmonizing task embeddings from various models, including smaller language models and LLMs with varied prompts, within a single vector space. Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios, while maintaining their performance comparable to architecture-specific methods.
