Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR
Julian Irigoyen, Arthur Söhler, Andreas Søeborg Kirkedal
TL;DR
This work reframes pruning in ASR as an implicit regularizer rather than merely a compression technique, introducing gradient- and Fisher-based sensitivity diagnostics to guide one-shot magnitude pruning. By jointly analyzing encoder and decoder components, the authors uncover architectural asymmetries: decoder FFNs are pruning-fragile, while decoder self-attention and late encoder layers harbor redundancy that can be removed to boost generalization, even without fine-tuning. Empirically, pruning 50% of decoder self-attention yields a substantial WER improvement on LibriSpeech test-other, and pruning late encoder layers also improves WER, with benefits persisting across Common Voice and TED-LIUM. Importantly, sensitivity-aware pruning enables aggressive compression (up to ~40.8% sparsity) with near-baseline WER and CER improvements, positioning pruning as a first-class architectural tool for Transformer-based ASR models and guiding its application to other architectures.
Abstract
We challenge the conventional view of neural network pruning as solely a compression technique, demonstrating that one-shot magnitude pruning serves as a powerful implicit regularizer for ASR. Using Whisper-small, we combine gradient- and Fisher-based sensitivity diagnostics with targeted, component-wise pruning. This reveals architectural asymmetries: decoder FFNs are pruning-fragile, whereas decoder self-attention and the last encoder layers contain redundancy that, when removed, improves generalization. Without fine-tuning, pruning 50% of decoder self-attention reduces WER by 2.38% absolute (20.44% relative) on LibriSpeech test-other; pruning the last four encoder layers at 50% instead yields a 1.72% absolute (14.8% relative) improvement. Gains persisted on Common Voice and TED-LIUM datasets. Beyond regularization benefits, our sensitivity-aware approach enables more aggressive one-shot compression. At 40% sparsity, where established global pruning approaches catastrophically fail, our method preserves near-baseline accuracy. This positions pruning as a first-class architectural design tool: knowing where to prune is as important as how much to prune.
