Table of Contents
Fetching ...

SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models

Yuqi Li, Yao Lu, Junhao Dong, Zeyu Dong, Chuanguang Yang, Xin Yin, Yihao Chen, Jianping Gou, Yingli Tian, Tingwen Huang

TL;DR

SGLP introduces a similarity-guided, fast layer partition pruning framework that combines Centered Kernel Alignment (CKA) for inter-layer similarity, Fisher Optimal Segmentation for semantically coherent layer partitioning, and GradNorm for efficient segment-wise layer importance assessment. By pruning within segments rather than across the full network, SGLP reduces search space and avoids excessive fine-tuning, achieving substantial reductions in FLOPs and parameters with minimal accuracy loss across image classification and large language models. The method demonstrates strong empirical gains on CIFAR-10/100, ImageNet, Imagenette2, Imagewoof2, LLaMA-based LLMs, and CNN1D signal datasets, outperforming state-of-the-art layer pruning approaches. These results suggest practical applicability for deploying large deep models in resource-constrained environments, with potential for extension to broader architectures and combined compression techniques.

Abstract

Layer pruning has emerged as a potent approach to remove redundant layers in the pre-trained network on the purpose of reducing network size and improve computational efficiency. However, existing layer pruning methods mostly overlook the intrinsic connections and inter-dependencies between different layers within complicated deep neural networks. This oversight can result in pruned models that do not preserve the essential characteristics of the pre-trained network as effectively as desired. To address these limitations, we propose a Similarity-Guided Layer Partition (SGLP) Pruning, a novel pruning framework that exploits representation similarity to guide efficient and informed layer removal for compressing large deep models. Our method begins by employing Centered Kernel Alignment (CKA) to quantify representational similarity between layers, uncovering structural patterns within the network. We then apply Fisher Optimal Segmentation on the similarity matrix to partition the network into semantically coherent layer segments. This segmentation allows pruning decisions to respect layer interdependencies and preserve essential knowledge. Within each segment, we introduce a fine-tuning-free importance evaluation using GradNorm, identifying and removing redundant layers in a targeted, segment-wise manner. Experimental results on both image classification tasks and large language models (LLMs) demonstrate that our proposed SGLP outperforms the state-of-the-art methods in accuracy and efficiency. Our approach achieves significant model compression with minimal performance degradation, making it well-suited for deployment in resource-limited environments.

SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models

TL;DR

SGLP introduces a similarity-guided, fast layer partition pruning framework that combines Centered Kernel Alignment (CKA) for inter-layer similarity, Fisher Optimal Segmentation for semantically coherent layer partitioning, and GradNorm for efficient segment-wise layer importance assessment. By pruning within segments rather than across the full network, SGLP reduces search space and avoids excessive fine-tuning, achieving substantial reductions in FLOPs and parameters with minimal accuracy loss across image classification and large language models. The method demonstrates strong empirical gains on CIFAR-10/100, ImageNet, Imagenette2, Imagewoof2, LLaMA-based LLMs, and CNN1D signal datasets, outperforming state-of-the-art layer pruning approaches. These results suggest practical applicability for deploying large deep models in resource-constrained environments, with potential for extension to broader architectures and combined compression techniques.

Abstract

Layer pruning has emerged as a potent approach to remove redundant layers in the pre-trained network on the purpose of reducing network size and improve computational efficiency. However, existing layer pruning methods mostly overlook the intrinsic connections and inter-dependencies between different layers within complicated deep neural networks. This oversight can result in pruned models that do not preserve the essential characteristics of the pre-trained network as effectively as desired. To address these limitations, we propose a Similarity-Guided Layer Partition (SGLP) Pruning, a novel pruning framework that exploits representation similarity to guide efficient and informed layer removal for compressing large deep models. Our method begins by employing Centered Kernel Alignment (CKA) to quantify representational similarity between layers, uncovering structural patterns within the network. We then apply Fisher Optimal Segmentation on the similarity matrix to partition the network into semantically coherent layer segments. This segmentation allows pruning decisions to respect layer interdependencies and preserve essential knowledge. Within each segment, we introduce a fine-tuning-free importance evaluation using GradNorm, identifying and removing redundant layers in a targeted, segment-wise manner. Experimental results on both image classification tasks and large language models (LLMs) demonstrate that our proposed SGLP outperforms the state-of-the-art methods in accuracy and efficiency. Our approach achieves significant model compression with minimal performance degradation, making it well-suited for deployment in resource-limited environments.

Paper Structure

This paper contains 29 sections, 25 equations, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: The overview framework of our proposed method. We take a 15-layer network as an example, where the squares in gray denote the layers with the lowest importance, which will be discarded when pruning. We first feed batches of inputs to the pre-trained network for forward propagation. Then, the similarity matrix is derived via Centered Kernel Alignment, which indicates the representations similarity among the layers. Based on the layer similarity, we partition the network into layer segments via Fisher Optimal Segmentation, which provides a basis for subsequent layer pruning. In each layer segment, we evaluate the importance for the layers via GradNorm, where the unimportant ones are removed to obtain a compact network.
  • Figure 2: Results of ablation studies.
  • Figure 3: Pruned and remained layers for ResNet-56 on CIFAR-10.