Table of Contents
Fetching ...

MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization

Han-Byul Kim, Joo Hyung Lee, Sungjoo Yoo, Hong-Seok Kim

TL;DR

MetaMix addresses activation instability encountered during mixed-precision activation quantization by introducing a meta-state precision searcher that couples bit-width exploration with weight training. The method alternates between bit-meta training, which builds a stable meta-state across multiple bit-width branches, and bit-search training, which learns per-layer bit-width probabilities on a fixed meta-state, followed by a weight-fine-tuning phase. Empirically, MetaMix achieves state-of-the-art accuracy-cost trade-offs on ImageNet for MobileNet-v2, MobileNet-v3, and ResNet-18, outperforming both mixed- and single-precision SOTA methods with faster bit-width search than NAS-based approaches. The approach stabilizes activation statistics, reduces training instability, and offers a practical path to efficient, high-accuracy quantized networks for edge and mobile deployments.

Abstract

Mixed-precision quantization of efficient networks often suffer from activation instability encountered in the exploration of bit selections. To address this problem, we propose a novel method called MetaMix which consists of bit selection and weight training phases. The bit selection phase iterates two steps, (1) the mixed-precision-aware weight update, and (2) the bit-search training with the fixed mixed-precision-aware weights, both of which combined reduce activation instability in mixed-precision quantization and contribute to fast and high-quality bit selection. The weight training phase exploits the weights and step sizes trained in the bit selection phase and fine-tunes them thereby offering fast training. Our experiments with efficient and hard-to-quantize networks, i.e., MobileNet v2 and v3, and ResNet-18 on ImageNet show that our proposed method pushes the boundary of mixed-precision quantization, in terms of accuracy vs. operations, by outperforming both mixed- and single-precision SOTA methods.

MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization

TL;DR

MetaMix addresses activation instability encountered during mixed-precision activation quantization by introducing a meta-state precision searcher that couples bit-width exploration with weight training. The method alternates between bit-meta training, which builds a stable meta-state across multiple bit-width branches, and bit-search training, which learns per-layer bit-width probabilities on a fixed meta-state, followed by a weight-fine-tuning phase. Empirically, MetaMix achieves state-of-the-art accuracy-cost trade-offs on ImageNet for MobileNet-v2, MobileNet-v3, and ResNet-18, outperforming both mixed- and single-precision SOTA methods with faster bit-width search than NAS-based approaches. The approach stabilizes activation statistics, reduces training instability, and offers a practical path to efficient, high-accuracy quantized networks for edge and mobile deployments.

Abstract

Mixed-precision quantization of efficient networks often suffer from activation instability encountered in the exploration of bit selections. To address this problem, we propose a novel method called MetaMix which consists of bit selection and weight training phases. The bit selection phase iterates two steps, (1) the mixed-precision-aware weight update, and (2) the bit-search training with the fixed mixed-precision-aware weights, both of which combined reduce activation instability in mixed-precision quantization and contribute to fast and high-quality bit selection. The weight training phase exploits the weights and step sizes trained in the bit selection phase and fine-tunes them thereby offering fast training. Our experiments with efficient and hard-to-quantize networks, i.e., MobileNet v2 and v3, and ResNet-18 on ImageNet show that our proposed method pushes the boundary of mixed-precision quantization, in terms of accuracy vs. operations, by outperforming both mixed- and single-precision SOTA methods.
Paper Structure (35 sections, 6 equations, 12 figures, 10 tables, 1 algorithm)

This paper contains 35 sections, 6 equations, 12 figures, 10 tables, 1 algorithm.

Figures (12)

  • Figure 1: Input activation distribution before quantizer with 8-bit, 4-bit, and 3-bit single-precision activation quantization of MobileNet-v2 (a) in 5th layer (depth-wise convolution in 2nd block) and (b) in 32nd layer (depth-wise convolution in 11th block). Each row has the same fixed weight bit-width and each column has the same fixed activation bit-width. 'FP' represents full-precision. In all the figures, x-axis is for values and y-axis is for frequency in log scale.
  • Figure 2: Trend of batch norm statistics over iterations when changing activation bit-width. We plot the running variance of batch norm which follows 5th (depth-wise) convolution layer in 2nd block (Top: FP weights, Middle: 4-bit weights, Bottom: applying MetaMix with FP weights).
  • Figure 3: MetaMix flow diagram and working mechanism.
  • Figure 4: MetaMix block structure design and operations on bit selection phase. ‘Act’ represents activation.
  • Figure 5: ImageNet top-1 accuracy vs. BOPs on (a) MobileNet-v2, (b) MobileNet-v3 (large), (c) ResNet-18 (HMQ hmq, DQ dq, DJPQ djpq, HAQ haq, NIPQ nipq, Fracbits fracbits, DDQ ddq, SDQ sdq, DNAS dnas, HAWQ-v3 hawqv3, EdMIPS edmips and state-of-the-art single-precision quantization PROFIT profit). PROFIT$^\dagger$ quantizes, in 8-bits, the input activation of 1st layer.
  • ...and 7 more figures