Compression-aware Training of Neural Networks using Frank-Wolfe
Max Zimmer, Christoph Spiegel, Sebastian Pokutta
TL;DR
This work tackles training neural networks that remain accurate under compression (pruning and low-rank decomposition) without retraining. It introduces a compression-aware framework built on Stochastic Frank-Wolfe with norm-constrained feasible regions, notably the group-$k$-support and spectral-$k$-support norms, to drive structured sparsity and low-rankness during training. A gradient-rescaled learning rate is shown to be crucial for convergence and pruning stability, with a theoretical convergence result in the non-convex stochastic setting. Empirically, SparseFW achieves competitive or superior performance across image classification and semantic segmentation tasks and offers efficiency gains over nuclear-norm based methods, indicating practical impact for deployable, compression-tolerant models.
Abstract
Many existing Neural Network pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, 'compression-aware' training, aims to obtain state-of-the-art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework centered around a versatile family of norm constraints and the Stochastic Frank-Wolfe (SFW) algorithm that encourage convergence to well-performing solutions while inducing robustness towards convolutional filter pruning and low-rank matrix decomposition. Our method is able to outperform existing compression-aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization. Our findings indicate that dynamically adjusting the learning rate of SFW, as suggested by Pokutta et al. (2020), is crucial for convergence and robustness of SFW-trained models and we establish a theoretical foundation for that practice.
