Table of Contents
Fetching ...

The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth

Victor Quétu, Zhu Liao, Enzo Tartaglione

TL;DR

The paper addresses the environmental and computational costs of oversized neural networks by introducing EASIER, an entropy-based metric that identifies rectifier-activated layers whose activations can be linearized to reduce network depth. By iteratively measuring layer-wise entropy on the training data and collapsing the lowest-entropy layer, EASIER achieves depth reduction with limited accuracy loss across multiple architectures and image-classification datasets. It situates itself among NAS and pruning approaches, highlighting its unique focus on depth via activation withdrawal and its advantage over entropy-guided pruning that can suffer from layer collapse. The method also demonstrates practical gains in FLOPs and inference time across diverse hardware, while acknowledging limitations related to training efficiency and performance on underfitted models, and outlining avenues for future improvements such as one-shot strategies and differentiable entropy proxies.

Abstract

While deep neural networks are highly effective at solving complex tasks, large pre-trained models are commonly employed even to solve consistently simpler downstream tasks, which do not necessarily require a large model's complexity. Motivated by the awareness of the ever-growing AI environmental impact, we propose an efficiency strategy that leverages prior knowledge transferred by large models. Simple but effective, we propose a method relying on an Entropy-bASed Importance mEtRic (EASIER) to reduce the depth of over-parametrized deep neural networks, which alleviates their computational burden. We assess the effectiveness of our method on traditional image classification setups. Our code is available at https://github.com/VGCQ/EASIER.

The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth

TL;DR

The paper addresses the environmental and computational costs of oversized neural networks by introducing EASIER, an entropy-based metric that identifies rectifier-activated layers whose activations can be linearized to reduce network depth. By iteratively measuring layer-wise entropy on the training data and collapsing the lowest-entropy layer, EASIER achieves depth reduction with limited accuracy loss across multiple architectures and image-classification datasets. It situates itself among NAS and pruning approaches, highlighting its unique focus on depth via activation withdrawal and its advantage over entropy-guided pruning that can suffer from layer collapse. The method also demonstrates practical gains in FLOPs and inference time across diverse hardware, while acknowledging limitations related to training efficiency and performance on underfitted models, and outlining avenues for future improvements such as one-shot strategies and differentiable entropy proxies.

Abstract

While deep neural networks are highly effective at solving complex tasks, large pre-trained models are commonly employed even to solve consistently simpler downstream tasks, which do not necessarily require a large model's complexity. Motivated by the awareness of the ever-growing AI environmental impact, we propose an efficiency strategy that leverages prior knowledge transferred by large models. Simple but effective, we propose a method relying on an Entropy-bASed Importance mEtRic (EASIER) to reduce the depth of over-parametrized deep neural networks, which alleviates their computational burden. We assess the effectiveness of our method on traditional image classification setups. Our code is available at https://github.com/VGCQ/EASIER.
Paper Structure (24 sections, 7 equations, 3 figures, 11 tables, 1 algorithm)

This paper contains 24 sections, 7 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of EASIER. We iteratively train, evaluate, and estimate the entropy on the training set and linearize the lowest-entropy layer of the neural network, until the performance drops.
  • Figure 2: Distribution of the product between $X\sim\mathcal{N}(0,1)$ and $W\sim\mathcal{N}(0,1)$ for different values of $\rho$ (a), and $p[Z>0]$ for different $\rho$ (b).
  • Figure 3: (a) EASIER applied on ResNet-18, VGG-16, Swin-T and MobileNetv2 networks on CIFAR-10. For each model, we gradually remove non-linear layers. (b) EASIER applied on ResNet-18 on CIFAR-10 with different rectifiers: ReLU, LeakyReLU, PReLU, GELU, and SiLU. Our method is not bound to a specific one and is effective with the most popular.