Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement

Dayou Mao; Yuhao Chen; Yifan Wu; Maximilian Gilles; Alexander Wong

Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement

Dayou Mao, Yuhao Chen, Yifan Wu, Maximilian Gilles, Alexander Wong

TL;DR

This work investigates the efficiency and robustness of multi-task learning (MTL) under lightweight feature extractors, and critically evaluates the generalizability of the fast gradient surrogate that replaces parameter-level gradients with feature-level gradients. It benchmarks 15 MTL optimization algorithms on ResNet18 backbones across MetaGraspNet, CityScapes, and NYU-v2, revealing that only MGDA and Aligned-MTL consistently benefit from feature-level gradients, while others can degrade. The authors introduce the Feature Disentanglement (FD) measure, a computationally efficient saliency-based metric derived from gradients with respect to the shared representation, and validate it using Ranking Similarity to align with test-time performance. Together, FD provides a faithful, scalable lens to identify MTL challenges, and the results offer practical guidance on applying MTL techniques to small backbones and complex vision tasks.

Abstract

One of the main motivations of MTL is to develop neural networks capable of inferring multiple tasks simultaneously. While countless methods have been proposed in the past decade investigating robust model architectures and efficient training algorithms, there is still lack of understanding of these methods when applied on smaller feature extraction backbones, the generalizability of the commonly used fast approximation technique of replacing parameter-level gradients with feature level gradients, and lack of comprehensive understanding of MTL challenges and how one can efficiently and effectively identify the challenges. In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods. We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground. We also compare the existing methods with and without using the fast gradient surrogate and empirically study the generalizability of this technique. Lastly, we propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL, and propose Ranking Similarity score as an evaluation metric for different identifiers to prove the faithfulness of our method.

Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement

TL;DR

Abstract

Paper Structure (20 sections, 9 equations, 5 figures, 7 tables)

This paper contains 20 sections, 9 equations, 5 figures, 7 tables.

Introduction
Problem Definition and Related Work
Gradient Manipulation
Gradient Balancing
Gradient Regularization
Fast Gradient Surrogate
Saliency Maps in Explainable AI
Benchmarks
Experiments on MetaGraspNet Dataset
Experiments on CityScapes and NYU-v2
Generalizability of Fast Gradient Surrogate
Feature Disentanglement Measure
Preliminaries
Method: Feature Disentanglement Measure
Evaluation Protocol: Ranking Similarity
...and 5 more sections

Figures (5)

Figure 1: Illustration of feature disentanglement calculation. In the above, $p_{\cdot j}$ denotes the mapping $i \mapsto p_{ij}$, which is the (smoothened) distribution of feature saliency at location $j$ across all tasks. Same for $p_{\cdot(j-1)}$ and $p_{\cdot(j+1)}$. If an extracted feature is disentangled for the $T$ down stream tasks, then each distribution $p_{\cdot j}$ should be concentrated on fewer tasks and have lower entropy.
Figure 2: Illustration of model architecture used on the MetaGraspNet metagraspnetv2 benchmark.
Figure 3: Training dynamics of GDS using gradients w.r.t. shared parameters.
Figure 4: Training dynamics of GMS using gradients w.r.t. shared parameters.
Figure 5: Training dynamics of Feature Disentanglement (FD) using gradients w.r.t. shared parameters.

Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement

TL;DR

Abstract

Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement

Authors

TL;DR

Abstract

Table of Contents

Figures (5)