Table of Contents
Fetching ...

Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training

Xiaoying Zhi, Varun Babbar, Rundong Liu, Pheobe Sun, Fran Silavong, Ruibo Shi, Sean Moran

TL;DR

The paper addresses the energy and computational costs of large neural networks by proposing a one-pass pruning framework that jointly learns a lightweight sub-network during training. It introduces a lightweight binary gating module coupled with a polarization regularizer to produce stable, unified sub-networks and uses a straight-through estimator for efficient gradient-based gating. Empirical results on ResNet architectures across CIFAR-10, CIFAR-100, and Tiny Imagenet show layer-pruning and channel-pruning can reduce FLOPs by roughly 14–22% with minimal accuracy loss (often within 3%), outperforming several baselines in similar settings. The approach offers practical energy savings during both training and inference without the overhead of dynamic gating at inference time, supporting greener AI deployment in resource-constrained environments.

Abstract

The subject of green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Existing solutions for reducing the computational load of training at inference time usually involve pruning the network parameters. Pruning schemes often create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph. We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks. Our proposed pruning scheme is green-oriented, as it only requires a one-off training to discover the optimal static sub-networks by dynamic pruning methods. The pruning scheme consists of a binary gating module and a polarizing loss function to uncover sub-networks with user-defined sparsity. Our method enables pruning and training simultaneously, which saves energy in both the training and inference phases and avoids extra computational overhead from gating modules at inference time. Our results on CIFAR-10, CIFAR-100, and Tiny Imagenet suggest that our scheme can remove 50% of connections in deep networks with <1% reduction in classification accuracy. Compared to other related pruning methods, our method demonstrates a lower drop in accuracy for equivalent reductions in computational cost.

Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training

TL;DR

The paper addresses the energy and computational costs of large neural networks by proposing a one-pass pruning framework that jointly learns a lightweight sub-network during training. It introduces a lightweight binary gating module coupled with a polarization regularizer to produce stable, unified sub-networks and uses a straight-through estimator for efficient gradient-based gating. Empirical results on ResNet architectures across CIFAR-10, CIFAR-100, and Tiny Imagenet show layer-pruning and channel-pruning can reduce FLOPs by roughly 14–22% with minimal accuracy loss (often within 3%), outperforming several baselines in similar settings. The approach offers practical energy savings during both training and inference without the overhead of dynamic gating at inference time, supporting greener AI deployment in resource-constrained environments.

Abstract

The subject of green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Existing solutions for reducing the computational load of training at inference time usually involve pruning the network parameters. Pruning schemes often create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph. We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks. Our proposed pruning scheme is green-oriented, as it only requires a one-off training to discover the optimal static sub-networks by dynamic pruning methods. The pruning scheme consists of a binary gating module and a polarizing loss function to uncover sub-networks with user-defined sparsity. Our method enables pruning and training simultaneously, which saves energy in both the training and inference phases and avoids extra computational overhead from gating modules at inference time. Our results on CIFAR-10, CIFAR-100, and Tiny Imagenet suggest that our scheme can remove 50% of connections in deep networks with <1% reduction in classification accuracy. Compared to other related pruning methods, our method demonstrates a lower drop in accuracy for equivalent reductions in computational cost.
Paper Structure (19 sections, 11 equations, 6 figures, 6 tables)

This paper contains 19 sections, 11 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of a gating module with binary decision as integrated into the original residual model. The learnable gating modules are trained as per other parts of the network. At inference, the gate decisions are pre-loaded, and only the network parameters whose gate decision is open are loaded and computed.
  • Figure 2: Comparison of our method with some naïve baselines on CIFAR-10 with ResNet-56. Left: Average pruning rate at inference vs Top-1 accuracy. Right % FLOPs reduction at inference vs Top-1 accuracy. The naive dropout method does not reduce FLOPs because it still involves computation through the "dropped" nodes - hence the omission.
  • Figure 3: Comparison between our scheme and related methods in literature on CIFAR-10 with ResNet-56 at inference. Left: Pruning rate vs Top-1 accuracy. Right % FLOPs reduction vs Top-1 accuracy drop.
  • Figure 4: Illustration of layer-pruning gating modules in ResNet.
  • Figure 5: Illustration of channel-pruning gating modules in ResNet: The gating module (a) before the first convolution layer; (b) between two convolutional layers; (c) after the second convolution layer. $K{=}1$ or $2$ in our experiments.
  • ...and 1 more figures