Change-of-Basis Pruning via Rotational Invariance
Alex Ning, Vainateya Rangaraju
TL;DR
The paper tackles the problem that structured pruning performance depends on the representation basis and proposes change-of-basis pruning CoB, enabled by Two-Subspace Radial Activations to maintain rotational invariance within two activation subspaces $\mathbb{R}^d = U \oplus V$.TSRAs allow orthogonal transforms to be merged into surrounding weights without extra parameters, providing a principled route to concentrate importance along selected axes. $\sigma(x) = f_U(|x_U|, |x_V|)x_U + f_V(|x_U|, |x_V|)x_V$ with $x = x_U + x_V$ preserves rotation within each subspace, and width saturation for TSRAs is upper-bounded by $\dim(S) \le 2(d_{i-1})$ without bias and $\dim(S) \le 2(d_{i-1}+1)$ with bias. $\text{TSRA}$ width-saturation therefore scales favorably for deep networks, enabling practical high-width architectures.A PCA-based change-of-basis is used to concentrate activation magnitude along principal components, and experiments on VGG-16 with CIFAR-10 show that CoB improves pruning robustness under both fixed-ratio and threshold-based pruning, extending the reliable pruning frontier from roughly 30% to about 70% pre-finetuning and achieving 90-96% compression with modest accuracy loss after finetuning. The work demonstrates that rotationally invariant designs can enable principled CoB pruning and outlines avenues for broader activation families, initialization strategies, and alternative pruning schemes.
Abstract
Structured pruning removes entire neurons or channels, but its effectiveness depends on how importance is distributed across the representation space. Change-of-basis (CoB) pruning addresses this challenge by applying orthogonal linear transformations that concentrate importance within certain dimensions. However, many standard deep learning architectures are not inherently invariant to such transformations. To enable compatibility, we introduce two-subspace radial activations (TSRAs): an activation family that is invariant to orthogonal linear transformations applied independently within its two activation subspaces. This invariance allows CoB transformations to be merged into surrounding weights without incurring extra parameters. We position this work as a proof-of-concept that a rotationally invariant design may offer a principled approach towards change-of-basis pruning. We do not provide an analysis of multiple TSRA candidates nor do we explore weight initialization for any TSRAs. These limitations, combined with other necessary modifications we make to permit rotational invariance, result in a slight accuracy drop of $4.52\%$ compared to a ReLU-based control. However, using activation-magnitude importance, VGG-16 implementing our CoB+TSRA framework shows encouraging results on CIFAR-10. Under fixed-ratio structured pruning, CoB improves accuracy over a TSRA baseline at all pruning ratios and extends reliable pruning frontier from roughly $30\%$ to $70\%$ of parameters without post-prune fine tuning. Under threshold-based pruning strategies, CoB prunes $90-96\%$ of parameters while maintaining $1-6\%$ accuracy drop after fine-tuning. Together, these results indicate that rotationally invariant architectures may offer a promising path towards CoB pruning.
