MaxQ: Multi-Axis Query for N:M Sparsity Network

Jingyang Xiang; Siqi Li; Junhao Chen; Zhuangzhi Chen; Tianxin Huang; Linpeng Peng; Yong Liu

MaxQ: Multi-Axis Query for N:M Sparsity Network

Jingyang Xiang, Siqi Li, Junhao Chen, Zhuangzhi Chen, Tianxin Huang, Linpeng Peng, Yong Liu

TL;DR

An efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, that achieves consistent improve-ments across diverse CNN architectures in various com-puter vision tasks, including image classification, object detection and instance segmentation.

Abstract

N:M sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. However, existing N:M sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they directly apply N:M sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:M masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a sparsity strategy that gradually increases the percentage of N:M weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:M soft masks can be precomputed as constants and folded into weights without causing any distortion to the sparse pattern and incurring additional computational overhead. Comprehensive experiments demonstrate that MaxQ achieves consistent improvements across diverse CNN architectures in various computer vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6\% top-1 accuracy on ImageNet and improve by over 2.8\% over the state-of-the-art. Codes and checkpoints are available at \url{https://github.com/JingyangXiang/MaxQ}.

MaxQ: Multi-Axis Query for N:M Sparsity Network

TL;DR

Abstract

Paper Structure (22 sections, 11 equations, 8 figures, 13 tables, 1 algorithm)

This paper contains 22 sections, 11 equations, 8 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Sparsity Granularity in Network Compression
Methodology
Preliminaries
Multi-Axis Query
Incremental Sparsity
Experiment
Experiment Settings
Comparison with N:M sparsity
Comparison with Unstructured Sparsity
Object Detection and Instance Segmentation
Ablation Study
Performance Analysis
Quantization
...and 7 more sections

Figures (8)

Figure 1: Comparison of the accuracy-sparse pattern Pareto curve of the ResNet50 on ImageNet. MaxQ shows the top-performing Pareto frontier compared with previous N:M sparsity methods hubara2021acceleratedzhou2021srstezhang2022learningZhang2023Bimask.
Figure 2: The framework of our MaxQ method, which queries the weights importance among the blocks and generates soft masks by querying the weight across multiple axes. For simplify, we only show single axis query.
Figure 3: The process of computing soft masks. The weights of the model are sorted in descending order based on their magnitudes for clear understanding. We assume $(N, p, \tau)=(4,0.5,0.1)$.
Figure 4: MaxQ to generate soft masks.
Figure 5: Convergence visualization for different strategies. Inverse means we firstly apply the N:M sparsity for blocks with smaller $\ell_1$-norm. N-M means reducing $\left \| \mathbf{b}^l_{g,:} \right \|_0$ from M to N.
...and 3 more figures

MaxQ: Multi-Axis Query for N:M Sparsity Network

TL;DR

Abstract

MaxQ: Multi-Axis Query for N:M Sparsity Network

Authors

TL;DR

Abstract

Table of Contents

Figures (8)