When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review

Maxime Fontana; Michael Spratling; Miaojing Shi

When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review

Maxime Fontana, Michael Spratling, Miaojing Shi

TL;DR

This review surveys how computer vision can exploit multi-task learning under partial supervision, detailing parameter-sharing, fusion, decomposition, and NAS strategies to balance tasks and mitigate data labeling needs. It clarifies optimization challenges—including loss weighting, gradient conflicts, and Pareto-front approaches—and surveys task-grouping and partially supervised techniques (self-supervised, semi-supervised, few-shot) to improve data efficiency. The authors synthesize datasets, tools, and benchmarking results to provide practical guidance for building scalable, data-efficient MTL CV systems. Overall, partially supervised MTL can match or exceed fully supervised performance in many settings, emphasizing the importance of task relationships, adaptive balancing, and comprehensive benchmarking. The work highlights future opportunities in large-scale task pools, adaptive sharing mechanisms, and cross-task learning in diverse CV domains.$

Abstract

Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously while exploiting their mutual relationships. By using shared resources to simultaneously calculate multiple outputs, this learning paradigm has the potential to have lower memory requirements and inference times compared to the traditional approach of using separate methods for each task. Previous work in MTL has mainly focused on fully-supervised methods, as task relationships can not only be leveraged to lower the level of data-dependency of those methods but they can also improve performance. However, MTL introduces a set of challenges due to a complex optimisation scheme and a higher labeling requirement. This review focuses on how MTL could be utilised under different partial supervision settings to address these challenges. First, this review analyses how MTL traditionally uses different parameter sharing techniques to transfer knowledge in between tasks. Second, it presents the different challenges arising from such a multi-objective optimisation scheme. Third, it introduces how task groupings can be achieved by analysing task relationships. Fourth, it focuses on how partially supervised methods applied to MTL can tackle the aforementioned challenges. Lastly, this review presents the available datasets, tools and benchmarking results of such methods.

When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review

TL;DR

Abstract

Paper Structure (36 sections, 43 equations, 15 figures, 6 tables, 1 algorithm)

This paper contains 36 sections, 43 equations, 15 figures, 6 tables, 1 algorithm.

Introduction
Multi-Task Parameter Sharing
Traditional Parameter Sharing
Sparse Multi-Task Representations
Clustering
Common-Trunk
Feature Fusion
CNN Sharing Strategies
Attention-based Sharing Strategies
Knowledge Decomposition
Tensor Factorization
Knowledge Distillation
Adapters
Neural Architecture Search
Optimisation Challenges
...and 21 more sections

Figures (15)

Figure 1: Overview of the literature review structure. Firstly, we introduce Multi-Task Parameter Sharing in \ref{['chapter:MT-parameter-sharing']}. Secondly, we review Optimisation Challenges in \ref{['chapter:Optimisation']}. Thirdly, we review how task relationships can be be used to group them in \ref{['sec:task-grouping']}. Finally, we introduce, in \ref{['chapter:partial-supervision']}, the different partially-supervised computer vision methods in MTL.
Figure 2: Overview of the different reviews on Multi-Task Learning.
Figure 3: Multi-Task Learning has mainly been divided into two architectural design schemes. Hard-parameter sharing (top) splits a shared backbone into task-specific heads which receives input from from the same set of features. Soft-parameter sharing (bottom) uses task-specific networks, but allows information to be shared between them.
Figure 4: Two task-specific CNN models CNN$_{i}$ and CNN$_{j}$. The NDDR-layer NDDR first concatenates the representations of the respective convolutional blocks. $1 \times 1$ convolutions are then run on this concatenation, one per task. Last, after batch normalisation, the features are propagated on to the next convolutional block of each model.
Figure 5: Visualisation of the different gradient update methods in MTL. The blue arrows represent the projections of the task-specific gradient update noted as $g_{1}$, $g_{2}$ and $g_{3}$. The red arrow represents the aggregated gradient update.
...and 10 more figures

When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review

TL;DR

Abstract

When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review

Authors

TL;DR

Abstract

Table of Contents

Figures (15)