Table of Contents
Fetching ...

Towards Meta-Pruning via Optimal Transport

Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

TL;DR

The paper tackles the challenge of pruning large neural networks without heavy fine-tuning by introducing Intra-Fusion, a meta-pruning framework that leverages Optimal Transport-based model fusion to incorporate discarded neurons into surviving ones. By reconfiguring the pruning step through OT and BN-aware fusion, it achieves substantial data-free accuracy recovery and enables significant training-time speedups via Split-Data strategies PaF and FaP. Empirical results demonstrate robust gains across multiple architectures and datasets, with data-free improvements up to large margins and data-driven gains that complement fine-tuning. The approach also provides theoretical and empirical insights into output preservation and loss-landscape positioning, and it shows promise as a scalable alternative to traditional data-parallel training by reducing communication and synchronization overhead.

Abstract

Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts. This paper introduces a novel approach named Intra-Fusion, challenging this prevailing pruning paradigm. Unlike existing methods that focus on designing meaningful neuron importance metrics, Intra-Fusion redefines the overlying pruning procedure. Through utilizing the concepts of model fusion and Optimal Transport, we leverage an agnostically given importance metric to arrive at a more effective sparse model representation. Notably, our approach achieves substantial accuracy recovery without the need for resource-intensive fine-tuning, making it an efficient and promising tool for neural network compression. Additionally, we explore how fusion can be added to the pruning process to significantly decrease the training time while maintaining competitive performance. We benchmark our results for various networks on commonly used datasets such as CIFAR-10, CIFAR-100, and ImageNet. More broadly, we hope that the proposed Intra-Fusion approach invigorates exploration into a fresh alternative to the predominant compression approaches. Our code is available here: https://github.com/alexandertheus/Intra-Fusion.

Towards Meta-Pruning via Optimal Transport

TL;DR

The paper tackles the challenge of pruning large neural networks without heavy fine-tuning by introducing Intra-Fusion, a meta-pruning framework that leverages Optimal Transport-based model fusion to incorporate discarded neurons into surviving ones. By reconfiguring the pruning step through OT and BN-aware fusion, it achieves substantial data-free accuracy recovery and enables significant training-time speedups via Split-Data strategies PaF and FaP. Empirical results demonstrate robust gains across multiple architectures and datasets, with data-free improvements up to large margins and data-driven gains that complement fine-tuning. The approach also provides theoretical and empirical insights into output preservation and loss-landscape positioning, and it shows promise as a scalable alternative to traditional data-parallel training by reducing communication and synchronization overhead.

Abstract

Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts. This paper introduces a novel approach named Intra-Fusion, challenging this prevailing pruning paradigm. Unlike existing methods that focus on designing meaningful neuron importance metrics, Intra-Fusion redefines the overlying pruning procedure. Through utilizing the concepts of model fusion and Optimal Transport, we leverage an agnostically given importance metric to arrive at a more effective sparse model representation. Notably, our approach achieves substantial accuracy recovery without the need for resource-intensive fine-tuning, making it an efficient and promising tool for neural network compression. Additionally, we explore how fusion can be added to the pruning process to significantly decrease the training time while maintaining competitive performance. We benchmark our results for various networks on commonly used datasets such as CIFAR-10, CIFAR-100, and ImageNet. More broadly, we hope that the proposed Intra-Fusion approach invigorates exploration into a fresh alternative to the predominant compression approaches. Our code is available here: https://github.com/alexandertheus/Intra-Fusion.
Paper Structure (43 sections, 2 equations, 28 figures, 21 tables, 3 algorithms)

This paper contains 43 sections, 2 equations, 28 figures, 21 tables, 3 algorithms.

Figures (28)

  • Figure 1: Structural pruning by considering groups
  • Figure 2: Options for target/source probability mass distribution with $\ell_1$-norm as importance.
  • Figure 3: Data-Free Pruning: ResNet50 on ImageNet. $\ell_1$ (left), Taylor (right).
  • Figure 4: Data-Free Pruning: ResNet18 on CIFAR-10 and VGG11-BN on CIFAR-100, $\ell_1$.
  • Figure 5: Output preservation comparison of the original model. Model: ResNet18. Dataset: CIFAR-10. Group: 0. Importance metric: $\ell_1$.
  • ...and 23 more figures