LayerCollapse: Adaptive compression of neural networks

Soheil Zibakhsh Shabgahi; Mohammad Sohail Shariff; Farinaz Koushanfar

LayerCollapse: Adaptive compression of neural networks

Soheil Zibakhsh Shabgahi, Mohammad Sohail Shariff, Farinaz Koushanfar

TL;DR

This work presents LayerCollapse, a novel structured pruning method to reduce the depth of fully connected layers, and proposes an innovative regularizer that promotes shallow fully connected layers, compressing the model with minimal performance impact.

Abstract

Handling the ever-increasing scale of contemporary deep learning and transformer-based models poses a significant challenge. Overparameterized Transformer networks outperform prior art in Natural Language processing and Computer Vision. These models contain hundreds of millions of parameters, demanding significant computational resources and making them prone to overfitting on down stream tasks. In this work we present LayerCollapse, a novel structured pruning method to reduce the depth of fully connected layers. We propose an innovative regularizer that promotes shallow fully connected layers, compressing the model with minimal performance impact. This regularizer enables post-training compression without fine-tuning while preserving performance. LayerCollapse controls model expressiveness by regularizing the activation functions between fully connected layers, modulating them to linearity. A linear activation function collapses the rank of a transformation to the rank of the corresponding linear transformation, which demands less resources from the hardware. We demonstrate the effectiveness of LayerCollapse by showing its compression capabilities in sentimental analysis, text generation, and image classification benchmarks.

LayerCollapse: Adaptive compression of neural networks

TL;DR

Abstract

Paper Structure (31 sections, 24 equations, 4 figures, 6 tables, 2 algorithms)

This paper contains 31 sections, 24 equations, 4 figures, 6 tables, 2 algorithms.

Introduction
Background
Role and Structure of MLPs
MLPs in Modern Architectures
Importance of Non-Linear Activations
Applications and Challenges
Methodology
Formal Definition
Compression Ratio
Prior Distribution Analysis of Regularization
Alternative Activation Functions
Experiments
Regularization Performance of LayerCollapse
Pre-Trained Models
Trained from Scratch.
...and 16 more sections

Figures (4)

Figure 1: Illustration of regression performance on a two-layer neural network, demonstrating the impact of varying levels of ReLU activation linearity on model overfitting and underfitting. We define the percent of linearity to be the negative slope of the ReLU activation.
Figure 2: Overview of the LayerCollapse process: $(a)$ The regularization loss shifts the PReLU towards linearity. $(b)$ The layer collapse operation involves eliminating a layer by replacing the hidden layer and adjusting the weights through the product of two matrices.
Figure 3: Evaluating the impact of LayerCollapse regularization on CIFAR100, we compare the top-1 accuracy of VGG11 both with and without this regularization technique.
Figure 4: Layer-wise collapse accuracy and parameter reduction analysis for ViT-T/16.

LayerCollapse: Adaptive compression of neural networks

TL;DR

Abstract

LayerCollapse: Adaptive compression of neural networks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)