Table of Contents
Fetching ...

PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

Arun Mallya, Svetlana Lazebnik

TL;DR

PackNet introduces a weight-based, iterative pruning and retraining framework to sequentially add multiple tasks to a single network while preserving prior task performance, thereby mitigating catastrophic forgetting without storing full task-specific models. By pruning unused parameters after each task and freezing the remaining weights, new tasks reuse existing representations and require only lightweight per-task masks, achieving near-independently trained task performance with modest storage overhead. The approach scales across architectures (VGG-16, ResNet, DenseNet) and datasets (ImageNet, Places365, CUBS, Cars, Flowers), outperforms proxy-loss methods like LwF, and handles large-scale and fine-grained classification tasks with strong robustness. These results suggest a practical, scalable path to multi-task learning within a single network, with potential extensions toward jointly learning weights and sparsity masks.

Abstract

This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that can then be employed to learn new tasks. By performing iterative pruning and network re-training, we are able to sequentially "pack" multiple tasks into a single network while ensuring minimal drop in performance and minimal storage overhead. Unlike prior work that uses proxy losses to maintain accuracy on older tasks, we always optimize for the task at hand. We perform extensive experiments on a variety of network architectures and large-scale datasets, and observe much better robustness against catastrophic forgetting than prior work. In particular, we are able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task. Code available at https://github.com/arunmallya/packnet

PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

TL;DR

PackNet introduces a weight-based, iterative pruning and retraining framework to sequentially add multiple tasks to a single network while preserving prior task performance, thereby mitigating catastrophic forgetting without storing full task-specific models. By pruning unused parameters after each task and freezing the remaining weights, new tasks reuse existing representations and require only lightweight per-task masks, achieving near-independently trained task performance with modest storage overhead. The approach scales across architectures (VGG-16, ResNet, DenseNet) and datasets (ImageNet, Places365, CUBS, Cars, Flowers), outperforms proxy-loss methods like LwF, and handles large-scale and fine-grained classification tasks with strong robustness. These results suggest a practical, scalable path to multi-task learning within a single network, with potential extensions toward jointly learning weights and sparsity masks.

Abstract

This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that can then be employed to learn new tasks. By performing iterative pruning and network re-training, we are able to sequentially "pack" multiple tasks into a single network while ensuring minimal drop in performance and minimal storage overhead. Unlike prior work that uses proxy losses to maintain accuracy on older tasks, we always optimize for the task at hand. We perform extensive experiments on a variety of network architectures and large-scale datasets, and observe much better robustness against catastrophic forgetting than prior work. In particular, we are able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task. Code available at https://github.com/arunmallya/packnet

Paper Structure

This paper contains 11 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of the evolution of a 5$\times$5 filter with steps of training. Initial training of the network for Task I learns a dense filter as illustrated in (a). After pruning by 60% (15/25) and re-training, we obtain a sparse filter for Task I, as depicted in (b), where white circles denote 0 valued weights. Weights retained for Task I are kept fixed for the remainder of the method, and are not eligible for further pruning. We allow the pruned weights to be updated for Task II, leading to filter (c), which shares weights learned for Task I. Another round of pruning by 33% (5/15) and re-training leads to filter (d), which is the filter used for evaluating on task II (Note that weights for Task I, in gray, are not considered for pruning). Hereafter, weights for Task II, depicted in orange, are kept fixed. This process is completed until desired, or we run out of pruned weights, as shown in filter (e). The final filter (e) for task III shares weights learned for tasks I and II. At test time, appropriate masks are applied depending on the selected task so as to replicate filters learned for the respective tasks.
  • Figure 2: Change in errors on prior tasks as new tasks are added for LwF (left) and our method (right). For LwF, errors on prior datasets increase with every added dataset. For our pruning-based method, the error remains the same even after a new dataset is added.
  • Figure 3: Dependence of errors on individual tasks on the order of task addition (see text for details). Each displayed value and error bar are obtained from 6 different runs. We use an initial pruning ratio of 50% for the ImageNet-trained VGG-16 and a pruning ratio of 75% after each dataset is added. 0.50, 0.75, 0.75 pruning column of Table \ref{['table:results']} reports the average over orderings.
  • Figure 4: This plot measures the change in top-1 error with pruning. The values above correspond to the case when the respective dataset is added as the first task, to an ImageNet-trained VGG-16 that is 50% pruned, except for the values corresponding to the ImageNet dataset which correspond to initial pruning. Note that the 0.75 pruning ratio values correspond to the blue bars in Figure \ref{['fig:training_order_nobias']}.
  • Figure 5: This figure shows that having free parameters in the lower layers of the network is essential for good performance. The numbers above are obtained when a task is added to the 50% pruned VGG-16 network and the only the specified layers are finetuned, without any further pruning.