Table of Contents
Fetching ...

Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond

Xinyu Wang, Hainiu Xu, Lin Gui, Yulan He

TL;DR

This work introduces FUTE, a framework for learning Unified Task Embeddings that align datasets and models from diverse architectures, including prompt-based LLMs, into a single vector space. It decouples task information into Dataset Task Embedding $e_D$ and Model Task Embedding $e_M$, using a fixed surrogate base model $\Theta_T$ and unsupervised data $U$ to enable cross-model comparisons without relying on the original training data. The methodology extends two prior task-embedding approaches (TaskEmb, TuPaTE) to a model-agnostic setting and demonstrates competitive performance on both small-language-model and LLM experiments, including zero-shot prompt selection scenarios. Key contributions include (i) formalizing separate DTE and MTE learning, (ii) enabling MTE for LLMs via prompts treated as unified units, (iii) leveraging CrossFit as a dataset-agnostic unsupervised source, and (iv) showing cross-domain transferability and efficiency advantages. This framework broadens the applicability and comparative analysis of task embeddings across heterogeneous models, with potential impact on multi-model adaptation, prompt engineering, and model interpretability.

Abstract

Task embedding, a meta-learning technique that captures task-specific information, has gained popularity, especially in areas such as multi-task learning, model editing, and interpretability. However, it faces challenges with the emergence of prompt-guided Large Language Models (LLMs) operating in a gradient-free manner. Existing task embedding methods rely on fine-tuned, task-specific language models, which hinders the adaptability of task embeddings across diverse models, especially prompt-based LLMs. To hardness the potential of task embeddings in the era of LLMs, we propose a framework for unified task embeddings (FUTE), harmonizing task embeddings from various models, including smaller language models and LLMs with varied prompts, within a single vector space. Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios, while maintaining their performance comparable to architecture-specific methods.

Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond

TL;DR

This work introduces FUTE, a framework for learning Unified Task Embeddings that align datasets and models from diverse architectures, including prompt-based LLMs, into a single vector space. It decouples task information into Dataset Task Embedding and Model Task Embedding , using a fixed surrogate base model and unsupervised data to enable cross-model comparisons without relying on the original training data. The methodology extends two prior task-embedding approaches (TaskEmb, TuPaTE) to a model-agnostic setting and demonstrates competitive performance on both small-language-model and LLM experiments, including zero-shot prompt selection scenarios. Key contributions include (i) formalizing separate DTE and MTE learning, (ii) enabling MTE for LLMs via prompts treated as unified units, (iii) leveraging CrossFit as a dataset-agnostic unsupervised source, and (iv) showing cross-domain transferability and efficiency advantages. This framework broadens the applicability and comparative analysis of task embeddings across heterogeneous models, with potential impact on multi-model adaptation, prompt engineering, and model interpretability.

Abstract

Task embedding, a meta-learning technique that captures task-specific information, has gained popularity, especially in areas such as multi-task learning, model editing, and interpretability. However, it faces challenges with the emergence of prompt-guided Large Language Models (LLMs) operating in a gradient-free manner. Existing task embedding methods rely on fine-tuned, task-specific language models, which hinders the adaptability of task embeddings across diverse models, especially prompt-based LLMs. To hardness the potential of task embeddings in the era of LLMs, we propose a framework for unified task embeddings (FUTE), harmonizing task embeddings from various models, including smaller language models and LLMs with varied prompts, within a single vector space. Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios, while maintaining their performance comparable to architecture-specific methods.
Paper Structure (44 sections, 6 equations, 6 figures, 5 tables)

This paper contains 44 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overview of FUTE. Existing methods generate task embeddings by leveraging model parameters, confining to models of identical architecture and initialization. This limitation, depicted on the left, results in embeddings that reside in separate vector spaces depending on the model, making cross-model similarity comparisons impossible. Our unified framework introduces a method for independently computing task embeddings, enabling the incorporation of diverse models, ranging from different neural architectures to Large Language Models (LLMs) with distinct prompts, into a unified vector space.
  • Figure 2: Comparison between FUTE and existing methods. (A) Existing methods typically utilize data $D=\{(x_i, y_i)\}_{i=1}^{n}$ and model $\Theta_M$ to generate task embedding $e$ for both the dataset and the model. (B) FUTE derives dataset task embedding (DTE) $e_D$ by introducing an independent surrogate base model $\Theta_T$. (C) FUTE further advances by deriving model task embedding (MTE) $e_M$ by incorporating unsupervised data $U$ to produce alternative input $\{(x'_i, \hat{y}^M_i)\}_{i=1}^{m}$, enabling model-specific embeddings without direct dependency on task data. (D) Additionally, FUTE computes MTE for Large Language Models (LLMs) with prompts by treating the combination of a prompt and an LLM as a single model.
  • Figure 3: T-SNE visualization of language model MTEs using FUTE with TuPaTE. Different task categories are represented by various colors, while different models are indicated by distinct shapes. E.g., a red circle highlights an MTE of BERT trained on a CR dataset.
  • Figure 4: T-SNE visualization of LLMs MTE using FUTE with TuPaTE. Prompts related to different task categories are represented by various colors, while different LLMs are indicated by distinct shapes. For example, a blue square indicates the MTE of Llama2-13B guided by an SA prompt.
  • Figure A1: T-SNE visualization of language model MTEs using FUTE with TuPaTE. Different task categories are represented by various colors, while different models or datasets are indicated by distinct shapes.
  • ...and 1 more figures