Table of Contents
Fetching ...

Exploring and Predicting Transferability across NLP Tasks

Tu Vu, Tong Wang, Tsendsuren Munkhdalai, Alessandro Sordoni, Adam Trischler, Andrew Mattarella-Micke, Subhransu Maji, Mohit Iyyer

TL;DR

This work conducts a large-scale empirical study of transferability across 33 NLP tasks spanning CR, QA, and SL, showing that intermediate fine-tuning on source tasks can yield gains even when data are scarce or the source task differs from the target. It introduces two task-embedding methods, TextEmb and TaskEmb, to predict the most transferable source tasks for a given target and demonstrates that TaskEmb consistently improves transferability predictions over data-size baselines. By combining these embeddings and evaluating across in-class and out-of-class transfers, the paper reveals that source-target similarity and domain alignment are crucial factors, sometimes more influential than raw source data size, especially in data-constrained regimes. The practical contribution includes a publicly released task library and code to compute task embeddings and identify beneficial source tasks, enabling more principled source selection in transfer learning for NLP. Overall, the findings emphasize the nuanced and data-dependent nature of transfer in NLP and provide predictive tools to navigate source-task selection efficiently.

Abstract

Recent advances in NLP demonstrate the effectiveness of training large-scale language models and transferring them to downstream tasks. Can fine-tuning these models on tasks other than language modeling further improve performance? In this paper, we conduct an extensive study of the transferability between 33 NLP tasks across three broad classes of problems (text classification, question answering, and sequence labeling). Our results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task (e.g., part-of-speech tagging transfers well to the DROP QA dataset). We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task, and we validate their effectiveness in experiments controlled for source and target data size. Overall, our experiments reveal that factors such as source data size, task and domain similarity, and task complexity all play a role in determining transferability.

Exploring and Predicting Transferability across NLP Tasks

TL;DR

This work conducts a large-scale empirical study of transferability across 33 NLP tasks spanning CR, QA, and SL, showing that intermediate fine-tuning on source tasks can yield gains even when data are scarce or the source task differs from the target. It introduces two task-embedding methods, TextEmb and TaskEmb, to predict the most transferable source tasks for a given target and demonstrates that TaskEmb consistently improves transferability predictions over data-size baselines. By combining these embeddings and evaluating across in-class and out-of-class transfers, the paper reveals that source-target similarity and domain alignment are crucial factors, sometimes more influential than raw source data size, especially in data-constrained regimes. The practical contribution includes a publicly released task library and code to compute task embeddings and identify beneficial source tasks, enabling more principled source selection in transfer learning for NLP. Overall, the findings emphasize the nuanced and data-dependent nature of transfer in NLP and provide predictive tools to navigate source-task selection efficiently.

Abstract

Recent advances in NLP demonstrate the effectiveness of training large-scale language models and transferring them to downstream tasks. Can fine-tuning these models on tasks other than language modeling further improve performance? In this paper, we conduct an extensive study of the transferability between 33 NLP tasks across three broad classes of problems (text classification, question answering, and sequence labeling). Our results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task (e.g., part-of-speech tagging transfers well to the DROP QA dataset). We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task, and we validate their effectiveness in experiments controlled for source and target data size. Overall, our experiments reveal that factors such as source data size, task and domain similarity, and task complexity all play a role in determining transferability.

Paper Structure

This paper contains 33 sections, 7 equations, 3 figures, 34 tables.

Figures (3)

  • Figure 1: A demonstration of our task embedding pipeline. Given a target task, we first compute its task embedding and then identify the most similar source task embedding (in this example, WikiHop) from a precomputed library via cosine similarity. Finally, we perform intermediate fine-tuning of BERT on the selected source task before fine-tuning on the target task.
  • Figure 2: In these plots (best viewed in zoom with color), each violin corresponds to a target task in the specified data regime. Each point inside a violin represents an individual source task; its color denotes task class, and its y-coordinate denotes target task performance after transfer. Above each violin, we provide the best source task (highest point within the violin) and TaskEmb's top-ranked source task (the red star). The horizontal black line in each violin represents the baseline target task performance of BERT without intermediate fine-tuning. TaskEmb generally selects source tasks that yield positive transfer, and often selects the best source task.
  • Figure 3: A 2D visualization of the task spaces of TextEmb and TaskEmb. TextEmb captures domain similarity (e.g., the Penn Treebank SL tasks are highly interconnected), while TaskEmb focuses more on task similarity (the two part-of-speech tagging tasks are interconnected despite their domain dissimilarity).