Table of Contents
Fetching ...

Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Arun Mallya, Dillon Davis, Svetlana Lazebnik

TL;DR

The paper introduces piggyback masks, a method to adapt a fixed pretrained backbone to multiple tasks by learning per-task binary masks that gate existing weights, incurring only about 1 bit per parameter per task. The approach enables end-to-end differentiable learning of masks without modifying backbone weights, yielding performance comparable to fine-tuned dedicated networks across diverse datasets and architectures while remaining agnostic to task order. Analyses show initialization and layer-wise sparsity patterns influence effectiveness, and task-specific batchnorm adaptation can mitigate domain-shift gaps. The method demonstrates strong results on Visual Decathlon and competitive semantic segmentation, highlighting practical value for multi-task deployment with low storage overhead.

Abstract

This work presents a method for adapting a single, fixed deep neural network to multiple tasks without affecting performance on already learned tasks. By building upon ideas from network quantization and pruning, we learn binary masks that piggyback on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task. These masks are learned in an end-to-end differentiable fashion, and incur a low overhead of 1 bit per network parameter, per task. Even though the underlying network is fixed, the ability to mask individual weights allows for the learning of a large number of filters. We show performance comparable to dedicated fine-tuned networks for a variety of classification tasks, including those with large domain shifts from the initial task (ImageNet), and a variety of network architectures. Unlike prior work, we do not suffer from catastrophic forgetting or competition between tasks, and our performance is agnostic to task ordering. Code available at https://github.com/arunmallya/piggyback.

Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

TL;DR

The paper introduces piggyback masks, a method to adapt a fixed pretrained backbone to multiple tasks by learning per-task binary masks that gate existing weights, incurring only about 1 bit per parameter per task. The approach enables end-to-end differentiable learning of masks without modifying backbone weights, yielding performance comparable to fine-tuned dedicated networks across diverse datasets and architectures while remaining agnostic to task order. Analyses show initialization and layer-wise sparsity patterns influence effectiveness, and task-specific batchnorm adaptation can mitigate domain-shift gaps. The method demonstrates strong results on Visual Decathlon and competitive semantic segmentation, highlighting practical value for multi-task deployment with low storage overhead.

Abstract

This work presents a method for adapting a single, fixed deep neural network to multiple tasks without affecting performance on already learned tasks. By building upon ideas from network quantization and pruning, we learn binary masks that piggyback on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task. These masks are learned in an end-to-end differentiable fashion, and incur a low overhead of 1 bit per network parameter, per task. Even though the underlying network is fixed, the ability to mask individual weights allows for the learning of a large number of filters. We show performance comparable to dedicated fine-tuned networks for a variety of classification tasks, including those with large domain shifts from the initial task (ImageNet), and a variety of network architectures. Unlike prior work, we do not suffer from catastrophic forgetting or competition between tasks, and our performance is agnostic to task ordering. Code available at https://github.com/arunmallya/piggyback.

Paper Structure

This paper contains 10 sections, 3 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Overview of our method for learning piggyback masks for fixed backbone networks. During training, we maintain a set of real-valued weights $m^r$ which are passed through a thresholding function to obtain binary-valued masks $m$. These masks are applied to the weights $W$ of the backbone network in an elementwise fashion, keeping individual weights active, or masked out. The gradients obtained through backpropagation of the task-specific loss are used to update the real-valued mask weights. After training, the real-valued mask weights are discarded and only the thresholded mask is retained, giving one network mask per task.
  • Figure 1: Summary of datasets used.
  • Figure 2: Datasets unlike ImageNet.
  • Figure 3: Percentage of weights masked out per ImageNet pre-trained VGG-16 layer. Datasets similar to ImageNet share a lot of the lower layers, and require fewer changes. The number of masked out weights increases with depth of layer.
  • Figure 4: Mixed training of layers using finetuning from scratch and piggyback masking.