Table of Contents
Fetching ...

Measuring Fine-Grained Relatedness in Multitask Learning via Data Attribution

Yiwen Tu, Ziqi Liu, Jiaqi W. Ma, Weijing Tang

TL;DR

This work extends data attribution -- which quantifies the influence of individual training data points on model predictions -- to MTL setting for measuring task relatedness, offering an efficient and fine-grained solution for measuring task relatedness and enhancing MTL models.

Abstract

Measuring task relatedness and mitigating negative transfer remain a critical open challenge in Multitask Learning (MTL). This work extends data attribution -- which quantifies the influence of individual training data points on model predictions -- to MTL setting for measuring task relatedness. We propose the MultiTask Influence Function (MTIF), a method that adapts influence functions to MTL models with hard or soft parameter sharing. Compared to conventional task relatedness measurements, MTIF provides a fine-grained, instance-level relatedness measure beyond the entire-task level. This fine-grained relatedness measure enables a data selection strategy to effectively mitigate negative transfer in MTL. Through extensive experiments, we demonstrate that the proposed MTIF efficiently and accurately approximates the performance of models trained on data subsets. Moreover, the data selection strategy enabled by MTIF consistently improves model performance in MTL. Our work establishes a novel connection between data attribution and MTL, offering an efficient and fine-grained solution for measuring task relatedness and enhancing MTL models.

Measuring Fine-Grained Relatedness in Multitask Learning via Data Attribution

TL;DR

This work extends data attribution -- which quantifies the influence of individual training data points on model predictions -- to MTL setting for measuring task relatedness, offering an efficient and fine-grained solution for measuring task relatedness and enhancing MTL models.

Abstract

Measuring task relatedness and mitigating negative transfer remain a critical open challenge in Multitask Learning (MTL). This work extends data attribution -- which quantifies the influence of individual training data points on model predictions -- to MTL setting for measuring task relatedness. We propose the MultiTask Influence Function (MTIF), a method that adapts influence functions to MTL models with hard or soft parameter sharing. Compared to conventional task relatedness measurements, MTIF provides a fine-grained, instance-level relatedness measure beyond the entire-task level. This fine-grained relatedness measure enables a data selection strategy to effectively mitigate negative transfer in MTL. Through extensive experiments, we demonstrate that the proposed MTIF efficiently and accurately approximates the performance of models trained on data subsets. Moreover, the data selection strategy enabled by MTIF consistently improves model performance in MTL. Our work establishes a novel connection between data attribution and MTL, offering an efficient and fine-grained solution for measuring task relatedness and enhancing MTL models.

Paper Structure

This paper contains 43 sections, 8 theorems, 31 equations, 2 figures, 7 tables.

Key Result

Proposition 1

Assuming the objective function $\mathcal{L}(\boldsymbol{w}, \boldsymbol{\sigma})$ in eq:data-level-objective is twice-differentiable and strictly convex in $\boldsymbol{w}$. For any two tasks $k \neq l$ and $1 \leq k, l \leq K$, the following results hold: (Shared influence) For $1 \leq i \leq n_k$ where the matrix $N:= H_{K+1, K+1} - \sum_{k=1}^K H_{K+1,k} H_{kk}^{-1} H_{k, K+1} \in \mathbb{R}^{

Figures (2)

  • Figure 1: Instance-level MTIF approximation quality on the synthetic and HAR datasets. The x-axis is the actual loss difference obtained by LOO retraining, and the y-axis is the predicted loss difference calculated by MTIF. The first two plots from the left show within-task and between-task results (in order) results on the synthetic dataset, while the other two plots present within-task and between-task results (in order) on the HAR dataset. The plots shown here reflect influences on a randomly picked test data point, while the trend holds more broadly on other test data points. The scatter points correspond to training data points in the first task of each dataset.
  • Figure 2: LOO experiments on linear regression. The x-axis is the actual loss difference obtained by LOO retraining, and the y-axis is the predicted loss difference calculated by MTIF. The first two figures from the left show within-task and between-task LOO (in order) results with $\delta=0.4$ and $\alpha=0$, while the other two figures present within-task and between-task results (in order) with $\delta=0.4$ and $\alpha=0.2$.

Theorems & Definitions (13)

  • Example 1: Multitask Linear Regression with Ridge Penalty
  • Example 2: Shared-Bottom Neural Network Model
  • Proposition 1: Instance-Level Within-task Influence, Between-task Influence, and Shared Influence
  • Lemma A.1: Hessian Matrix Structure for Data-Level Inference
  • Lemma A.2: Influence Scores for Instance-Level Analysis
  • proof
  • Lemma A.3: Invertibility of Hessian
  • Lemma A.4: Hessian Inverse
  • proof : Proof of Lemma \ref{['lemma:Hessian_invertibility']} and Lemma \ref{['lemma:Hessian_inverse']}
  • Lemma B.1: Hessian Matrix Structure for Task-Level Inference
  • ...and 3 more