Vanishing Contributions: A Unified Approach to Smoothly Transition Neural Models into Compressed Form
Lorenzo Nikiforos, Charalampos Antoniadis, Luciano Prono, Fabio Pareschi, Riccardo Rovatti, Gianluca Setti
TL;DR
This work addresses the accuracy degradation often seen when compressing deep networks. It introduces Vanishing Contributions (VCON), a general training strategy that runs the original and compressed paths in parallel and gradually shifts weight from the original to the compressed path using a linearly decaying coefficient, specifically $ \bar g^{(i),t}_{\Theta,\tilde{\Theta}}(\cdot) = \beta^t f^{(i)}_{\Theta}(\cdot) + (1-\beta^t) g^{(i)}_{\tilde{\Theta}}(\cdot)$ with $\beta^t = \max(1-\frac{t}{Q},0)$. The approach is evaluated across three compression modalities—pruning, binary quantization, and low-rank decomposition—on computer vision and natural language processing benchmarks, yielding typical gains above 3% and occasional improvements up to 20%. Results show VCON consistently outperforms standard post-shot compression baselines and remains robust across granularities and tasks, underscoring its practicality for real-world deployment. Overall, VCON provides a general, lightweight extension to existing compression pipelines that improves stability, preserves accuracy, and can be readily integrated into diverse architectures.
Abstract
The increasing scale of deep neural networks has led to a growing need for compression techniques such as pruning, quantization, and low-rank decomposition. While these methods are very effective in reducing memory, computation and energy consumption, they often introduce severe accuracy degradation when applied directly. We introduce Vanishing Contributions (VCON), a general approach for smoothly transitioning neural models into compressed form. Rather than replacing the original network directly with its compressed version, VCON executes the two in parallel during fine-tuning. The contribution of the original (uncompressed) model is progressively reduced, while that of the compressed model is gradually increased. This smooth transition allows the network to adapt over time, improving stability and mitigating accuracy degradation. We evaluate VCON across computer vision and natural language processing benchmarks, in combination with multiple compression strategies. Across all scenarios, VCON leads to consistent improvements: typical gains exceed 3%, while some configuration exhibits accuracy boosts of 20%. VCON thus provides a generalizable method that can be applied to the existing compression techniques, with evidence of consistent gains across multiple benchmarks.
