Table of Contents
Fetching ...

Joint-Task Regularization for Partially Labeled Multi-Task Learning

Kento Nishi, Junsik Kim, Wanhua Li, Hanspeter Pfister

TL;DR

JTR stands out from existing approaches in that it regularizes all tasks jointly rather than separately in pairs-therefore, it achieves linear complexity relative to the number of tasks while previous methods scale quadratically.

Abstract

Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets. Most multi-task learning methods depend on fully labeled datasets wherein each input example is accompanied by ground-truth labels for all target tasks. Unfortunately, curating such datasets can be prohibitively expensive and impractical, especially for dense prediction tasks which require per-pixel labels for each image. With this in mind, we propose Joint-Task Regularization (JTR), an intuitive technique which leverages cross-task relations to simultaneously regularize all tasks in a single joint-task latent space to improve learning when data is not fully labeled for all tasks. JTR stands out from existing approaches in that it regularizes all tasks jointly rather than separately in pairs -- therefore, it achieves linear complexity relative to the number of tasks while previous methods scale quadratically. To demonstrate the validity of our approach, we extensively benchmark our method across a wide variety of partially labeled scenarios based on NYU-v2, Cityscapes, and Taskonomy.

Joint-Task Regularization for Partially Labeled Multi-Task Learning

TL;DR

JTR stands out from existing approaches in that it regularizes all tasks jointly rather than separately in pairs-therefore, it achieves linear complexity relative to the number of tasks while previous methods scale quadratically.

Abstract

Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets. Most multi-task learning methods depend on fully labeled datasets wherein each input example is accompanied by ground-truth labels for all target tasks. Unfortunately, curating such datasets can be prohibitively expensive and impractical, especially for dense prediction tasks which require per-pixel labels for each image. With this in mind, we propose Joint-Task Regularization (JTR), an intuitive technique which leverages cross-task relations to simultaneously regularize all tasks in a single joint-task latent space to improve learning when data is not fully labeled for all tasks. JTR stands out from existing approaches in that it regularizes all tasks jointly rather than separately in pairs -- therefore, it achieves linear complexity relative to the number of tasks while previous methods scale quadratically. To demonstrate the validity of our approach, we extensively benchmark our method across a wide variety of partially labeled scenarios based on NYU-v2, Cityscapes, and Taskonomy.
Paper Structure (47 sections, 7 equations, 4 figures, 19 tables, 1 algorithm)

This paper contains 47 sections, 7 equations, 4 figures, 19 tables, 1 algorithm.

Figures (4)

  • Figure 1: An overview of Joint-Task Regularization (JTR) for multi-task learning with partially labeled samples. JTR "stacks" predictions and labels, encodes them into a single joint-task latent space, and minimizes the latent embedding distance. JTR regularizes unlabeled task predictions using the labels of other tasks jointly in the latent space.
  • Figure 2: Overview of JTR. We define an input $x$, a shared feature extractor $f_{\phi}$, task-specific decoders $h_{\psi^t}$, and output predictions $\hat{y}^t$. For labeled tasks, supervised loss ($\mathcal{L}_{SL}$) is applied. Then, predictions ($\hat{y}^t$) are concatenated to form $\hat{Y}$, while labels ($y^t$) are concatenated to form a target tensor $Y$ (using $\hat{y}^t$ as a substitute is no label exists). The JTR encoder $g_{\theta_1}$ encodes predictions from multiple tasks into one joint-task latent space. $\mathcal{L}_{Dist}$ enforces $\hat{Y}$'s latent embedding to be close to that of $Y$. Gradients from $\mathcal{L}_{Dist}$ apply to $g_{\theta_1}$ and $h_{\psi^t}$. Gradients from $\mathcal{L}_{Recon}$ apply to $g_{\theta_1}$ and $g_{\theta_2}$, preventing $g_{\theta_1}$ from learning a trivial solution (i.e. encoding all inputs to a single point).
  • Figure 3: Comparison of dense prediction outputs from a SegNet trained with MTPSL and JTR on NYU-v2 "randomlabels" alongside the corresponding input image and ground-truth labels. Examples are sampled randomly from the test set without cherry-picking. Visually, the predictions of the JTR model are slightly more accurate than those of the MTPSL model.
  • Figure 4: Visualization of input images $x$, dense prediction outputs $\hat{y}$, labels $y$, JTR reconstructions of the noisy prediction tensor $g(\hat{Y}_x)$, and JTR reconstructions of the reliable target tensor $g(Y_x)$ for a SegNet trained with JTR on NYU-v2 "randomlabels." Examples are randomly sampled from the training set without cherry-picking. Missing $y_x$ entries indicate unlabeled tasks under "randomlabels."