Table of Contents
Fetching ...

Auxiliary Learning by Implicit Differentiation

Aviv Navon, Idan Achituve, Haggai Maron, Gal Chechik, Ethan Fetaya

TL;DR

AuxiLearn introduces a bi-level framework that uses implicit differentiation to optimize auxiliary learning for neural networks. It handles two settings: (i) learning a non-linear combination of predefined auxiliary losses to form a single coherent objective, and (ii) generating novel auxiliary tasks from data when none are provided, via a teacher-student setup. The optimization leverages the implicit function theorem to compute hypergradients with respect to auxiliary parameters, using Neumann-series approximations for efficiency. Theoretical analysis uncovers potential overfitting risks in the auxiliary space and highlights the Newton update as a key indicator of auxiliary usefulness. Empirically, AuxiLearn improves main-task performance across image segmentation and low-data classification tasks, often outperforming strong baselines and demonstrating effective automatic auxiliary design and weighting.

Abstract

Training neural networks with auxiliary tasks is a common practice for improving the performance on a main task of interest. Two main challenges arise in this multi-task learning setting: (i) designing useful auxiliary tasks; and (ii) combining auxiliary tasks into a single coherent loss. Here, we propose a novel framework, AuxiLearn, that targets both challenges based on implicit differentiation. First, when useful auxiliaries are known, we propose learning a network that combines all losses into a single coherent objective function. This network can learn non-linear interactions between tasks. Second, when no useful auxiliary task is known, we describe how to learn a network that generates a meaningful, novel auxiliary task. We evaluate AuxiLearn in a series of tasks and domains, including image segmentation and learning with attributes in the low data regime, and find that it consistently outperforms competing methods.

Auxiliary Learning by Implicit Differentiation

TL;DR

AuxiLearn introduces a bi-level framework that uses implicit differentiation to optimize auxiliary learning for neural networks. It handles two settings: (i) learning a non-linear combination of predefined auxiliary losses to form a single coherent objective, and (ii) generating novel auxiliary tasks from data when none are provided, via a teacher-student setup. The optimization leverages the implicit function theorem to compute hypergradients with respect to auxiliary parameters, using Neumann-series approximations for efficiency. Theoretical analysis uncovers potential overfitting risks in the auxiliary space and highlights the Newton update as a key indicator of auxiliary usefulness. Empirically, AuxiLearn improves main-task performance across image segmentation and low-data classification tasks, often outperforming strong baselines and demonstrating effective automatic auxiliary design and weighting.

Abstract

Training neural networks with auxiliary tasks is a common practice for improving the performance on a main task of interest. Two main challenges arise in this multi-task learning setting: (i) designing useful auxiliary tasks; and (ii) combining auxiliary tasks into a single coherent loss. Here, we propose a novel framework, AuxiLearn, that targets both challenges based on implicit differentiation. First, when useful auxiliaries are known, we propose learning a network that combines all losses into a single coherent objective function. This network can learn non-linear interactions between tasks. Second, when no useful auxiliary task is known, we describe how to learn a network that generates a meaningful, novel auxiliary task. We evaluate AuxiLearn in a series of tasks and domains, including image segmentation and learning with attributes in the low data regime, and find that it consistently outperforms competing methods.

Paper Structure

This paper contains 35 sections, 1 theorem, 8 equations, 8 figures, 8 tables, 2 algorithms.

Key Result

Proposition 1

Let $\mathcal{L}_T(W,\phi)=\sum_i\ell_{main}(\mathbf{x}_i^t,\boldsymbol{y}_i^t,W)+\phi\cdot \ell_{aux}(\mathbf{x}_i^t,\boldsymbol{y}_i^t,W)$. Suppose that $\phi=0$ and that the main task was trained until convergence. We have i.e. the gradient with respect to the auxiliary weight is the inner product between the Newton methods update and the gradient of the loss on the auxiliary set.

Figures (8)

  • Figure 1: The AuxiLearn framework. (a) Learning to combine losses into a single coherent loss term. Here, the auxiliary network operates over a vector of losses. (b) Generating a novel auxiliary task. Here the auxiliary network operates over the input space. In both cases, $g(\cdot~;\phi)$ is optimized using IFT based on $\mathcal{L}_A$.
  • Figure 2: Loss landscape generated by the auxiliary network. Darker is higher. See text for details.
  • Figure 3: Loss images on test examples from NYUv2: (a) original image; (b) semantic segmentation ground truth; (c) auxiliaries loss; (d) segmentation (main task) loss; (e) adaptive pixel-wise weight $\sum_j\partial \mathcal{L}_T/\partial \ell_j$.
  • Figure 4: t-SNE applied to auxiliary labels learned for Frog and Deer classes, in CIFAR10. Best viewed in color.
  • Figure 5: Optimizing task weights on the training set reduce to single-task learning.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof