Table of Contents
Fetching ...

Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression

Jonas Schmitt, Ruiping Liu, Junwei Zheng, Jiaming Zhang, Rainer Stiefelhagen

TL;DR

CPD addresses the need for resource-efficient vision models by proposing a model- and task-agnostic pruning framework that unifies pruning with knowledge distillation. It introduces a Combing step to automatically resolve layer dependencies, a Hessian-based Pruning pipeline to select and remove channels, and a Distillation step to transfer knowledge from the full model to the pruned one. Empirically, CPD yields substantial speedups in classification (up to 4.31×) and notable latency reductions in segmentation (≈48% and 26%) with modest accuracy or mIoU losses, across CNNs and Vision Transformers on ImageNet and ADE20K. The work demonstrates broad generalization and practical impact for deploying compact models in resource-constrained settings, such as intelligent transportation systems and robotics, while outlining directions to extend the framework to other architectures and tasks.

Abstract

Lightweight and effective models are essential for devices with limited resources, such as intelligent vehicles. Structured pruning offers a promising approach to model compression and efficiency enhancement. However, existing methods often tie pruning techniques to specific model architectures or vision tasks. To address this limitation, we propose a novel unified pruning framework Comb, Prune, Distill (CPD), which addresses both model-agnostic and task-agnostic concerns simultaneously. Our framework employs a combing step to resolve hierarchical layer-wise dependency issues, enabling architecture independence. Additionally, the pruning pipeline adaptively remove parameters based on the importance scoring metrics regardless of vision tasks. To support the model in retaining its learned information, we introduce knowledge distillation during the pruning step. Extensive experiments demonstrate the generalizability of our framework, encompassing both convolutional neural network (CNN) and transformer models, as well as image classification and segmentation tasks. In image classification we achieve a speedup of up to x4.3 with a accuracy loss of 1.8% and in semantic segmentation up to x1.89 with a 5.1% loss in mIoU.

Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression

TL;DR

CPD addresses the need for resource-efficient vision models by proposing a model- and task-agnostic pruning framework that unifies pruning with knowledge distillation. It introduces a Combing step to automatically resolve layer dependencies, a Hessian-based Pruning pipeline to select and remove channels, and a Distillation step to transfer knowledge from the full model to the pruned one. Empirically, CPD yields substantial speedups in classification (up to 4.31×) and notable latency reductions in segmentation (≈48% and 26%) with modest accuracy or mIoU losses, across CNNs and Vision Transformers on ImageNet and ADE20K. The work demonstrates broad generalization and practical impact for deploying compact models in resource-constrained settings, such as intelligent transportation systems and robotics, while outlining directions to extend the framework to other architectures and tasks.

Abstract

Lightweight and effective models are essential for devices with limited resources, such as intelligent vehicles. Structured pruning offers a promising approach to model compression and efficiency enhancement. However, existing methods often tie pruning techniques to specific model architectures or vision tasks. To address this limitation, we propose a novel unified pruning framework Comb, Prune, Distill (CPD), which addresses both model-agnostic and task-agnostic concerns simultaneously. Our framework employs a combing step to resolve hierarchical layer-wise dependency issues, enabling architecture independence. Additionally, the pruning pipeline adaptively remove parameters based on the importance scoring metrics regardless of vision tasks. To support the model in retaining its learned information, we introduce knowledge distillation during the pruning step. Extensive experiments demonstrate the generalizability of our framework, encompassing both convolutional neural network (CNN) and transformer models, as well as image classification and segmentation tasks. In image classification we achieve a speedup of up to x4.3 with a accuracy loss of 1.8% and in semantic segmentation up to x1.89 with a 5.1% loss in mIoU.
Paper Structure (15 sections, 9 equations, 6 figures, 4 tables)

This paper contains 15 sections, 9 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Model compression results. (a) For image classification, our CPD method achieves ${\times}2.15$ speedup over ResNet-50. (b) For semantic segmentation on ADE20K, our method reduces ${\sim}48\%$ and ${\sim}26\%$ latency of ViT-DeiT-S and SeaFormer-L, respectively.
  • Figure 2: Overview of CPD pipeline including Combing, Pruning, Distillation. In the combing step (Sec. \ref{['sec:meth_combing']}), our dependency resolving algorithm extracts the dependency structure of the given architecture. Afterwards we initialize the to be pruned model (student) and the original model (teacher) with the same weights and start pruning (Sec. \ref{['sec:meth_pruning']}) the model. While pruning, we use KD (Sec. \ref{['sec:meth_kd']}) to help the student to retain more information.
  • Figure 3: Example of direct relation between operations in a model. Operations with the same color are directly related
  • Figure 4: Merging of coupled subgroups based on their common parent coupling operation
  • Figure 5: Sparsity and mIoU of ViT-DeiT-S on ADE20K.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition III.1: Direct Relation