Table of Contents
Fetching ...

$α$VIL: Learning to Leverage Auxiliary Tasks for Multitask Learning

Rafael Kourdis, Gabriel Gordon-Hall, Philip John Gorinski

TL;DR

The paper tackles negative and positive transfer in multitask learning by introducing αVIL, a gradient-based metaoptimization framework that dynamically weights auxiliary tasks using task-specific model updates. αVIL defines α-variables to mix deltas from single-task updates and optimizes these deltas to minimize the target task loss, updating both model parameters and task weights accordingly. Empirical results in Computer Vision (MultiMNIST) and Natural Language Understanding (RoBERTa-based NLU tasks) show that αVIL can outperform single-task and standard multitask baselines and often surpass strong target-focused methods like DIW, particularly in test performance and generalization. The approach is flexible and extensible, with potential for alternative α-optimization strategies and joint post-estimation optimization to further improve outcomes.

Abstract

Multitask Learning is a Machine Learning paradigm that aims to train a range of (usually related) tasks with the help of a shared model. While the goal is often to improve the joint performance of all training tasks, another approach is to focus on the performance of a specific target task, while treating the remaining ones as auxiliary data from which to possibly leverage positive transfer towards the target during training. In such settings, it becomes important to estimate the positive or negative influence auxiliary tasks will have on the target. While many ways have been proposed to estimate task weights before or during training they typically rely on heuristics or extensive search of the weighting space. We propose a novel method called $α$-Variable Importance Learning ($α$VIL) that is able to adjust task weights dynamically during model training, by making direct use of task-specific updates of the underlying model's parameters between training epochs. Experiments indicate that $α$VIL is able to outperform other Multitask Learning approaches in a variety of settings. To our knowledge, this is the first attempt at making direct use of model updates for task weight estimation.

$α$VIL: Learning to Leverage Auxiliary Tasks for Multitask Learning

TL;DR

The paper tackles negative and positive transfer in multitask learning by introducing αVIL, a gradient-based metaoptimization framework that dynamically weights auxiliary tasks using task-specific model updates. αVIL defines α-variables to mix deltas from single-task updates and optimizes these deltas to minimize the target task loss, updating both model parameters and task weights accordingly. Empirical results in Computer Vision (MultiMNIST) and Natural Language Understanding (RoBERTa-based NLU tasks) show that αVIL can outperform single-task and standard multitask baselines and often surpass strong target-focused methods like DIW, particularly in test performance and generalization. The approach is flexible and extensible, with potential for alternative α-optimization strategies and joint post-estimation optimization to further improve outcomes.

Abstract

Multitask Learning is a Machine Learning paradigm that aims to train a range of (usually related) tasks with the help of a shared model. While the goal is often to improve the joint performance of all training tasks, another approach is to focus on the performance of a specific target task, while treating the remaining ones as auxiliary data from which to possibly leverage positive transfer towards the target during training. In such settings, it becomes important to estimate the positive or negative influence auxiliary tasks will have on the target. While many ways have been proposed to estimate task weights before or during training they typically rely on heuristics or extensive search of the weighting space. We propose a novel method called -Variable Importance Learning (VIL) that is able to adjust task weights dynamically during model training, by making direct use of task-specific updates of the underlying model's parameters between training epochs. Experiments indicate that VIL is able to outperform other Multitask Learning approaches in a variety of settings. To our knowledge, this is the first attempt at making direct use of model updates for task weight estimation.
Paper Structure (7 sections, 2 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 7 sections, 2 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Examples images of digits found in MNIST (left) and the super-imposed digits present in MultiMNIST (right). Gold labels for the MultiMNIST examples for top-left/bottom-right respectively are 3/5, 1/6, 2/2, 6/7.
  • Figure 2: General Multitask model architecture for MultiMNIST experiments.
  • Figure 3: $\alpha$ parameter values (left) and task-specific weights (right) over the course of $\alpha$VIL training on MultiMNIST $\alpha$VIL. Task 1 is the target task.
  • Figure 4: General Multitask model architecture for NLU experiments.