MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer

Yu-Shan Tai; An-Yeu; Wu

MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer

Yu-Shan Tai, An-Yeu, Wu

TL;DR

Vision transformers incur high compute and memory demands, making effective post-training quantization challenging due to asymmetric activation distributions. The authors propose MPTQ-ViT, a mixed-precision PTQ framework that combines SQ-b to reduce activation asymmetry, OPT-m to compute data-driven region-wise scaling factors for post-GeLU values, and Greedy MP to allocate layer-wide bit-width by balancing performance and compressibility. Empirical results on ViT, DeiT, and Swin demonstrate strong gains under both single-precision and mixed-precision quantization on ImageNet, with competitive results on COCO, significantly outperforming prior PTQ baselines at low bit-widths. Overall, the work shows that fine-grained, data-driven quantization parameters together with greedy layer-wise width allocation can dramatically improve compressibility and accuracy for ViTs in practical deployments.

Abstract

While vision transformers (ViTs) have shown great potential in computer vision tasks, their intense computation and memory requirements pose challenges for practical applications. Existing post-training quantization methods leverage value redistribution or specialized quantizers to address the non-normal distribution in ViTs. However, without considering the asymmetry in activations and relying on hand-crafted settings, these methods often struggle to maintain performance under low-bit quantization. To overcome these challenges, we introduce SmoothQuant with bias term (SQ-b) to alleviate the asymmetry issue and reduce the clamping loss. We also introduce optimal scaling factor ratio search (OPT-m) to determine quantization parameters by a data-dependent mechanism automatically. To further enhance the compressibility, we incorporate the above-mentioned techniques and propose a mixed-precision post-training quantization framework for vision transformers (MPTQ-ViT). We develop greedy mixed-precision quantization (Greedy MP) to allocate layer-wise bit-width considering both model performance and compressibility. Our experiments on ViT, DeiT, and Swin demonstrate significant accuracy improvements compared with SOTA on the ImageNet dataset. Specifically, our proposed methods achieve accuracy improvements ranging from 0.90% to 23.35% on 4-bit ViTs with single-precision and from 3.82% to 78.14% on 5-bit fully quantized ViTs with mixed-precision.

MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer

TL;DR

Abstract

Paper Structure (20 sections, 11 equations, 6 figures, 8 tables)

This paper contains 20 sections, 11 equations, 6 figures, 8 tables.

Introduction
Related Works
Vision Transformer (ViTs)
Value Redistribution for Quantization
Specialized Post-Training Quantizer for ViTs
Mixed-Precision Quantization (MP)
Proposed Methods
SmoothQuant with Bias Term (SQ-b)
Optimal Scaling Factor Ratio Search (OPT-m)
Greedy MP Quantization (Greedy MP)
Experimental Results
Performance Comparison and Ablation Study
Single-Precision Quantization (SP)
Mixed-Precision Quantization (MP)
Comparison with Value Redistribution for NLP
...and 5 more sections

Figures (6)

Figure 1: Proposed mixed-precision post-training quantization framework for ViT (MPTQ-ViT). (a) SQ-b, (b) OPT-m, and (c) Greedy MP.
Figure 1: Box plots of block-wise post-GeLU values on (a) ViT-B and (b) DeiT-S.
Figure 2: OPT-m under 6-bit quantization. Neg-GeLU/Pos-GeLU are the histograms of negative/positive post-GeLU values.
Figure 3: L2 distance between $\mu$ and $\mu_r$ of ViT-L.
Figure 4: Distribution of negative (Neg) and positive (Pos) post-GeLU values of $9^{th}$ blocks of DeiT-S under 4-bit quantization: (a)(b) original, (c)(d) TSPTQ-ViT tsptq_vit, (e)(f) proposed OPT-m.
...and 1 more figures

MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer

TL;DR

Abstract

MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (6)