Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

Yuan Gao; Weizhong Zhang; Wenhan Luo; Lin Ma; Jin-Gang Yu; Gui-Song Xia; Jiayi Ma

Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma

TL;DR

The paper tackles how to leverage auxiliary task labels to boost a primary task without increasing the inference cost beyond a single-task model. It introduces asymmetric architecture-based methods, Aux-G and Aux-NAS, that enable auxiliary gradients and/or features during training while pruning cross-task connections so that inference relies solely on a primary-to-auxiliary pathway that can be removed, yielding a single-task inference graph. Aux-NAS further uses a NAS objective with an $\ell_1$ penalty on auxiliary-to-primary connections to converge to a model where only primary-to-auxiliary connections remain, ensuring negligible inference overhead. Extensive experiments across six tasks on NYU v2, CityScapes, and Taskonomy with CNNs and ViTs show consistent gains over optimization-based baselines and robustness across task combinations, datasets, and backbones, with linear scalability in the number of auxiliary tasks. The work offers a practical, architecture-centric route to improved primary-task performance under realistic inference constraints, and it can be integrated with existing auxiliary learning techniques for further gains.

Abstract

We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure for the primary and auxiliary tasks, which produces different networks for training and inference. Specifically, starting from two single task networks/branches (each representing a task), we propose a novel method with evolving networks where only primary-to-auxiliary links exist as the cross-task connections after convergence. These connections can be removed during the primary task inference, resulting in a single-task inference cost. We achieve this by formulating a Neural Architecture Search (NAS) problem, where we initialize bi-directional connections in the search space and guide the NAS optimization converging to an architecture with only the single-side primary-to-auxiliary connections. Moreover, our method can be incorporated with optimization-based auxiliary learning approaches. Extensive experiments with six tasks on NYU v2, CityScapes, and Taskonomy datasets using VGG, ResNet, and ViT backbones validate the promising performance. The codes are available at https://github.com/ethanygao/Aux-NAS.

Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

TL;DR

penalty on auxiliary-to-primary connections to converge to a model where only primary-to-auxiliary connections remain, ensuring negligible inference overhead. Extensive experiments across six tasks on NYU v2, CityScapes, and Taskonomy with CNNs and ViTs show consistent gains over optimization-based baselines and robustness across task combinations, datasets, and backbones, with linear scalability in the number of auxiliary tasks. The work offers a practical, architecture-centric route to improved primary-task performance under realistic inference constraints, and it can be integrated with existing auxiliary learning techniques for further gains.

Abstract

Paper Structure (23 sections, 6 equations, 4 figures, 10 tables)

This paper contains 23 sections, 6 equations, 4 figures, 10 tables.

Introduction
Taxonomy of Our Methods
Related Work
Methods
The Asymmetric Architecture with Soft Parameter Sharing
The Auxiliary Gradient Method
The Auxiliary Feature and Gradient Method with NAS
Fusion Operation
Experiments
Comparison with Optimization-based Methods
Comparison with Architecture-based Methods
Different Primary-Auxiliary Task Combinations
Different Datasets
Different Backbones
Scalability to More Auxiliary Tasks
...and 8 more sections

Figures (4)

Figure 1: Overview of the proposed methods. Our methods are based on an asymmetric architecture that employs different networks for training and inference, where we exploit gradients and/or features from the auxiliary task during the training, and preserve a single-task cost for evaluating the primary task. Our first method (Leftmost) leverages the auxiliary gradients. Our second method (Rightmost) exploits both auxiliary features and gradients, where the auxiliary-to-primary connections (green dash lines) are gradually pruned out by NAS, resulting in a converged architecture with only primary-to-auxiliary connections (the line widths indicate the converged architecture weights). Finally, the primary-to-auxiliary connections, as well as the auxiliary branch, can be safely removed to obtain a single task network (Middle) to inference the primary task. The network arrows indicate the directions/inverse directions of the feature/gradient flow. (Best view in colors.)
Figure 2: The asymmetric primary-auxiliary architecture with soft parameter sharing. (Best view in colors.)
Figure 3: The illustration of the proposed fusion operator. Note that the auxiliary features (in orange color) are concatenated by its architecture weights. The dash line indicates that the regularized NAS objective in Eq. \ref{['nas_obj']} enables to cut off the whole auxiliary computations (also the following 1x1 conv due to 0 input). (Best view in colors.)
Figure 4: An illustration for the inter-task connections (i.e., the search space) of the auxiliary learning (ours) and the multi-task learning architectures. We use 3 tasks (or 1 primary task plus 2 auxiliary tasks) as an example.

Theorems & Definitions (3)

Remark 1
Remark 2
Remark 3

Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

TL;DR

Abstract

Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (3)