Table of Contents
Fetching ...

Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning

Leonardo Iurada, Marco Ciccone, Tatiana Tommasi

TL;DR

This work tackles the cost of training large vision models by enabling pruning at initialization through a data-aware NTK trace bound. It introduces Path eXclusion (PX), a method that decomposes training dynamics along activation paths to derive a data-driven saliency for pruning, preserving the NTK spectrum of the dense network. PX demonstrates the ability to locate effective lottery-ticket subnetworks even at extreme sparsity and to transfer well from pre-trained starting points, with spectral preservation and layer-width stability observed across tasks. The approach yields substantial cost and memory savings while maintaining competitive or superior performance, and the authors provide code for reproducibility.

Abstract

Recent advances in neural network pruning have shown how it is possible to reduce the computational costs and memory demands of deep learning models before training. We focus on this framework and propose a new pruning at initialization algorithm that leverages the Neural Tangent Kernel (NTK) theory to align the training dynamics of the sparse network with that of the dense one. Specifically, we show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account by providing an analytical upper bound to the NTK's trace obtained by decomposing neural networks into individual paths. This leads to our Path eXclusion (PX), a foresight pruning method designed to preserve the parameters that mostly influence the NTK's trace. PX is able to find lottery tickets (i.e. good paths) even at high sparsity levels and largely reduces the need for additional training. When applied to pre-trained models it extracts subnetworks directly usable for several downstream tasks, resulting in performance comparable to those of the dense counterpart but with substantial cost and computational savings. Code available at: https://github.com/iurada/px-ntk-pruning

Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning

TL;DR

This work tackles the cost of training large vision models by enabling pruning at initialization through a data-aware NTK trace bound. It introduces Path eXclusion (PX), a method that decomposes training dynamics along activation paths to derive a data-driven saliency for pruning, preserving the NTK spectrum of the dense network. PX demonstrates the ability to locate effective lottery-ticket subnetworks even at extreme sparsity and to transfer well from pre-trained starting points, with spectral preservation and layer-width stability observed across tasks. The approach yields substantial cost and memory savings while maintaining competitive or superior performance, and the authors provide code for reproducibility.

Abstract

Recent advances in neural network pruning have shown how it is possible to reduce the computational costs and memory demands of deep learning models before training. We focus on this framework and propose a new pruning at initialization algorithm that leverages the Neural Tangent Kernel (NTK) theory to align the training dynamics of the sparse network with that of the dense one. Specifically, we show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account by providing an analytical upper bound to the NTK's trace obtained by decomposing neural networks into individual paths. This leads to our Path eXclusion (PX), a foresight pruning method designed to preserve the parameters that mostly influence the NTK's trace. PX is able to find lottery tickets (i.e. good paths) even at high sparsity levels and largely reduces the need for additional training. When applied to pre-trained models it extracts subnetworks directly usable for several downstream tasks, resulting in performance comparable to those of the dense counterpart but with substantial cost and computational savings. Code available at: https://github.com/iurada/px-ntk-pruning
Paper Structure (17 sections, 15 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 15 equations, 12 figures, 5 tables, 1 algorithm.

Figures (12)

  • Figure 1: Our Path eXclusion (PX) involves two copies of the original dense network. One copy (bottom left) estimates data-relevant paths, depicted by blue arrows, and injects the extracted information into the other network (blue shading). The other copy (bottom right) evaluates path relevance in terms of parameter connections in the network, illustrated by black connections. These estimations are then combined to score each parameter, finding a subnetwork by retaining only the most relevant paths based on data, architecture, and initialization. The identified sparse subnetwork closely mimics the training dynamics of the original dense network.
  • Figure 2: Average classification accuracy at different sparsity levels on CIFAR-10 using ResNet-20, CIFAR-100 using VGG-16 and Tiny-ImageNet using ResNet-18, respectively. Each experiment is repeated three times. We report in shaded colors the standard deviation.
  • Figure 3: Average classification accuracy at different sparsity levels on CIFAR-10, CIFAR-100 and Tiny-ImageNet using pre-trained ResNet-50 as architecture. The first column reports the results of starting from the supervised ImageNet pre-training. The second column reports the performance when starting from the MoCov2 pre-training on ImageNet. Finally, in the third column we report the results when starting from CLIP. Each experiment is repeated three times. We report in shaded colors the standard deviation.
  • Figure 4: Average mean Intersection over Union (mIoU) at different sparsity levels on Pascal VOC2012 using DeepLabV3+ with pre-trained ResNet-50 as the backbone. Each experiment is repeated three times. Standard deviations are in shaded colors.
  • Figure 5: Fixed-Weight-NTK spectrum of ResNet-20 on the CIFAR-10 dataset at 93.12% sparsity ratio.
  • ...and 7 more figures