Table of Contents
Fetching ...

Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning

David Schulte, Felix Hamborg, Alan Akbik

Abstract

Intermediate task transfer learning can greatly improve model performance. If, for example, one has little training data for emotion detection, first fine-tuning a language model on a sentiment classification dataset may improve performance strongly. But which task to choose for transfer learning? Prior methods producing useful task rankings are infeasible for large source pools, as they require forward passes through all source language models. We overcome this by introducing Embedding Space Maps (ESMs), light-weight neural networks that approximate the effect of fine-tuning a language model. We conduct the largest study on NLP task transferability and task selection with 12k source-target pairs. We find that applying ESMs on a prior method reduces execution time and disk space usage by factors of 10 and 278, respectively, while retaining high selection performance (avg. regret@5 score of 2.95).

Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning

Abstract

Intermediate task transfer learning can greatly improve model performance. If, for example, one has little training data for emotion detection, first fine-tuning a language model on a sentiment classification dataset may improve performance strongly. But which task to choose for transfer learning? Prior methods producing useful task rankings are infeasible for large source pools, as they require forward passes through all source language models. We overcome this by introducing Embedding Space Maps (ESMs), light-weight neural networks that approximate the effect of fine-tuning a language model. We conduct the largest study on NLP task transferability and task selection with 12k source-target pairs. We find that applying ESMs on a prior method reduces execution time and disk space usage by factors of 10 and 278, respectively, while retaining high selection performance (avg. regret@5 score of 2.95).

Paper Structure

This paper contains 22 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Embedding Space Maps approximate how a fine-tuned language model embeds an input text $x$ by transforming embeddings produced by the base model.
  • Figure 2: We use T-SNE to visualize embeddings of inputs of the SNLI validation split using BERT (l.), BERT fine-tuned on SNLI (m.), and BERT and an ESM that was trained using the fine-tuned model (r.). The ESM-transformed embeddings are clearly arranged with regard to their classes. While classes are not as distinguished as when embedded by the fine-tuned model, a clear gradient is visible (albeit having applied dimension reduction).
  • Figure 3: The baseline performance indicates the performance resulting from fine-tuning the base model without any intermediary task. Marks indicate the sources ranked highest by ESM-LogME and LogME.
  • Figure 4: Input Column Assigment
  • Figure 5: Label Column Assigment