Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers
Zhu Liao, Nour Hezbri, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione
TL;DR
The paper tackles the inefficiency of overparameterized deep networks by proposing Till the Layers Collapse (TLC), a BN-driven method that prunes entire layers to reduce depth and latency. TLC relies on batch normalization parameters, using the BN statistics to infer an ON/OFF state for individual neurons and to decide which layers are expendable without substantial accuracy loss. By ranking layers by their impact on performance and removing the least important ones—while optionally linearizing remaining ON neurons—TLC achieves depth compression with competitive or improved accuracy across image classification and NLP tasks, often outperforming BN-based pruning baselines. This approach promises more sustainable AI by reducing compute and energy demands, with validated improvements across diverse architectures like ResNet-18, Swin-T, MobileNet-V2, VGG-16bn, BERT, and RoBERTa on multiple datasets; it also highlights an accessible, BN-statistics-driven path for scalable model compression.
Abstract
Today, deep neural networks are widely used since they can handle a variety of complex tasks. Their generality makes them very powerful tools in modern technology. However, deep neural networks are often overparameterized. The usage of these large models consumes a lot of computation resources. In this paper, we introduce a method called \textbf{T}ill the \textbf{L}ayers \textbf{C}ollapse (TLC), which compresses deep neural networks through the lenses of batch normalization layers. By reducing the depth of these networks, our method decreases deep neural networks' computational requirements and overall latency. We validate our method on popular models such as Swin-T, MobileNet-V2, and RoBERTa, across both image classification and natural language processing (NLP) tasks.
