Table of Contents
Fetching ...

IDAP++: Advancing Divergence-Based Pruning via Filter-Level and Layer-Level Optimization

Aleksei Samarin, Artem Nazarenko, Egor Kotenko, Valentin Malykh, Alexander Savelev, Aleksei Toropov

TL;DR

IDAP++ tackles neural-network overparameterization by introducing a flow-divergence–based framework that jointly prunes at filter and layer levels. Modeling networks as continuous information-flow trajectories, it defines stage-specific divergences, an iterative non-linear pruning schedule, and an adaptive budget to guarantee bounded accuracy loss. The two-stage approach achieves major reductions in FLOPs and parameters across CNNs, transformers, and NLP models, with competitive or superior task performance and significant wall-clock savings versus baselines. The method provides both practical deployment benefits in resource-constrained environments and a theoretical lens in which neural networks behave as information-flow systems with additive composition and scale-invariant metrics. Its architecture-agnostic design and demonstrated cross-domain applicability make it a strong candidate for hardware-aware compression and future integration with quantization and co-design.

Abstract

This paper presents a novel approach to neural network compression that addresses redundancy at both the filter and architectural levels through a unified framework grounded in information flow analysis. Building on the concept of tensor flow divergence, which quantifies how information is transformed across network layers, we develop a two-stage optimization process. The first stage employs iterative divergence-aware pruning to identify and remove redundant filters while preserving critical information pathways. The second stage extends this principle to higher-level architecture optimization by analyzing layer-wise contributions to information propagation and selectively eliminating entire layers that demonstrate minimal impact on network performance. The proposed method naturally adapts to diverse architectures, including convolutional networks, transformers, and hybrid designs, providing a consistent metric for comparing the structural importance across different layer types. Experimental validation across multiple modern architectures and datasets reveals that this combined approach achieves substantial model compression while maintaining competitive accuracy. The presented approach achieves parameter reduction results that are globally comparable to those of state-of-the-art solutions and outperforms them across a wide range of modern neural network architectures, from convolutional models to transformers. The results demonstrate how flow divergence serves as an effective guiding principle for both filter-level and layer-level optimization, offering practical benefits for deployment in resource-constrained environments.

IDAP++: Advancing Divergence-Based Pruning via Filter-Level and Layer-Level Optimization

TL;DR

IDAP++ tackles neural-network overparameterization by introducing a flow-divergence–based framework that jointly prunes at filter and layer levels. Modeling networks as continuous information-flow trajectories, it defines stage-specific divergences, an iterative non-linear pruning schedule, and an adaptive budget to guarantee bounded accuracy loss. The two-stage approach achieves major reductions in FLOPs and parameters across CNNs, transformers, and NLP models, with competitive or superior task performance and significant wall-clock savings versus baselines. The method provides both practical deployment benefits in resource-constrained environments and a theoretical lens in which neural networks behave as information-flow systems with additive composition and scale-invariant metrics. Its architecture-agnostic design and demonstrated cross-domain applicability make it a strong candidate for hardware-aware compression and future integration with quantization and co-design.

Abstract

This paper presents a novel approach to neural network compression that addresses redundancy at both the filter and architectural levels through a unified framework grounded in information flow analysis. Building on the concept of tensor flow divergence, which quantifies how information is transformed across network layers, we develop a two-stage optimization process. The first stage employs iterative divergence-aware pruning to identify and remove redundant filters while preserving critical information pathways. The second stage extends this principle to higher-level architecture optimization by analyzing layer-wise contributions to information propagation and selectively eliminating entire layers that demonstrate minimal impact on network performance. The proposed method naturally adapts to diverse architectures, including convolutional networks, transformers, and hybrid designs, providing a consistent metric for comparing the structural importance across different layer types. Experimental validation across multiple modern architectures and datasets reveals that this combined approach achieves substantial model compression while maintaining competitive accuracy. The presented approach achieves parameter reduction results that are globally comparable to those of state-of-the-art solutions and outperforms them across a wide range of modern neural network architectures, from convolutional models to transformers. The results demonstrate how flow divergence serves as an effective guiding principle for both filter-level and layer-level optimization, offering practical benefits for deployment in resource-constrained environments.

Paper Structure

This paper contains 52 sections, 4 theorems, 55 equations, 4 figures, 38 tables, 6 algorithms.

Key Result

Theorem 1

For any network $\mathcal{N}_0$ compressed with IDAP++, the compressed network $\mathcal{N}^*$ satisfies: while achieving maximal sparsity under the given constraints.

Figures (4)

  • Figure 1: Visualization of information flow through network depth. Arrows represent derivative-based flow measurements at different depth coordinates s.
  • Figure 2: Comparison of pruning methods under 50-80% sparsity.
  • Figure 3: Evolution of peak VRAM usage during IDAP++ compression for vision models.
  • Figure 4: Evolution of relative computational cost during IDAP++ compression for vision models.

Theorems & Definitions (9)

  • Theorem 1
  • Lemma 2: Scale Invariance
  • proof
  • Lemma 3: Additive Composition
  • proof
  • Theorem 4: Additive Composition
  • proof
  • proof
  • proof