Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost
Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma
TL;DR
The paper tackles how to leverage auxiliary task labels to boost a primary task without increasing the inference cost beyond a single-task model. It introduces asymmetric architecture-based methods, Aux-G and Aux-NAS, that enable auxiliary gradients and/or features during training while pruning cross-task connections so that inference relies solely on a primary-to-auxiliary pathway that can be removed, yielding a single-task inference graph. Aux-NAS further uses a NAS objective with an $\ell_1$ penalty on auxiliary-to-primary connections to converge to a model where only primary-to-auxiliary connections remain, ensuring negligible inference overhead. Extensive experiments across six tasks on NYU v2, CityScapes, and Taskonomy with CNNs and ViTs show consistent gains over optimization-based baselines and robustness across task combinations, datasets, and backbones, with linear scalability in the number of auxiliary tasks. The work offers a practical, architecture-centric route to improved primary-task performance under realistic inference constraints, and it can be integrated with existing auxiliary learning techniques for further gains.
Abstract
We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure for the primary and auxiliary tasks, which produces different networks for training and inference. Specifically, starting from two single task networks/branches (each representing a task), we propose a novel method with evolving networks where only primary-to-auxiliary links exist as the cross-task connections after convergence. These connections can be removed during the primary task inference, resulting in a single-task inference cost. We achieve this by formulating a Neural Architecture Search (NAS) problem, where we initialize bi-directional connections in the search space and guide the NAS optimization converging to an architecture with only the single-side primary-to-auxiliary connections. Moreover, our method can be incorporated with optimization-based auxiliary learning approaches. Extensive experiments with six tasks on NYU v2, CityScapes, and Taskonomy datasets using VGG, ResNet, and ViT backbones validate the promising performance. The codes are available at https://github.com/ethanygao/Aux-NAS.
