REPrune: Channel Pruning via Kernel Representative Selection

Mincheol Park; Dongjin Kim; Cheonjun Park; Yuna Park; Gyeong Eun Gong; Won Woo Ro; Suhyun Kim

REPrune: Channel Pruning via Kernel Representative Selection

Mincheol Park, Dongjin Kim, Cheonjun Park, Yuna Park, Gyeong Eun Gong, Won Woo Ro, Suhyun Kim

TL;DR

REPrune tackles the challenge of heavy pruning granularity in channel pruning by analyzing kernels at a finer, per-channel level. It uses agglomerative clustering with Ward linkage to identify representative kernels within each input channel and then solves a greedy maximum cluster coverage problem to select filters that cover these representatives, enabling immediate acceleration within a concurrent training-pruning pipeline. The method demonstrates strong accuracy retention at high FLOPs reductions across image recognition and object detection benchmarks, outperforming several channel-, clustering-, and kernel-pruning baselines and offering training-time efficiency gains. This approach offers a practical path to deploy highly pruned CNNs on general-purpose hardware without a separate finetuning stage, potentially accelerating real-world CV workloads on both data-center GPUs and edge devices.

Abstract

Channel pruning is widely accepted to accelerate modern convolutional neural networks (CNNs). The resulting pruned model benefits from its immediate deployment on general-purpose software and hardware resources. However, its large pruning granularity, specifically at the unit of a convolution filter, often leads to undesirable accuracy drops due to the inflexibility of deciding how and where to introduce sparsity to the CNNs. In this paper, we propose REPrune, a novel channel pruning technique that emulates kernel pruning, fully exploiting the finer but structured granularity. REPrune identifies similar kernels within each channel using agglomerative clustering. Then, it selects filters that maximize the incorporation of kernel representatives while optimizing the maximum cluster coverage problem. By integrating with a simultaneous training-pruning paradigm, REPrune promotes efficient, progressive pruning throughout training CNNs, avoiding the conventional train-prune-finetune sequence. Experimental results highlight that REPrune performs better in computer vision tasks than existing methods, effectively achieving a balance between acceleration ratio and performance retention.

REPrune: Channel Pruning via Kernel Representative Selection

TL;DR

Abstract

Paper Structure (43 sections, 17 equations, 6 figures, 12 tables, 2 algorithms)

This paper contains 43 sections, 17 equations, 6 figures, 12 tables, 2 algorithms.

Introduction
Related Work
Channel pruning
Clustering-based pruning
Kernel pruning
Methodology
Prerequisite
Preliminary: Agglomerative Clustering
Foundation of Clusters Per Channel
Filter Selection via Maximum Cluster Coverage
Complete Pipeline of REPrune
Experiment
Datasets and models
Evaluation settings
Image Recognition
...and 28 more sections

Figures (6)

Figure 1: To accelerate CNN and minimize its information loss simultaneously, REPrune intends to select filters associated with patterned key kernels targeted by kernel pruning.
Figure 2: An overview of the REPrune methodology for identifying redundant kernels and selecting filters. Every channel performs agglomerative clustering on its corresponding kernel set in each layer. Once clusters are formed in accordance with the target channel sparsity, our proposed solver for the MCP starts its greedy filter selection until the target number of channels is satisfied. This solver selects a filter that includes a representative kernel from each grouped cluster per channel.
Figure 3: The illustration of coverage rates for ResNet-56 on CIFAR-10 during the optimization of our proposed MCP. Each box contains the coverage rates from all pruned convolutional layers throughout the training epoch.
Figure 4: Computing throughput (img/s) on image inference using ResNet-50 on the ImageNet validation dataset.
Figure 5: Pruning strategy for the basic residual block, where only the $3\times 3$ convolutional layer marked with dotted lines is pruned by 50% (assuming $s^l$ is 0.5), leaving the output channel dimension unchanged.
...and 1 more figures

REPrune: Channel Pruning via Kernel Representative Selection

TL;DR

Abstract

REPrune: Channel Pruning via Kernel Representative Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)