Table of Contents
Fetching ...

MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

Yunshan Zhong, Yuyao Zhou, Fei Chao, Rongrong Ji

TL;DR

MBQuant tackles arbitrary bit-width QNNs by eliminating weight-switching error with a fixed-2-bit, multi-branch architecture that realizes any target bit-width through branch selection. It further mitigates activation-quantization error via an amortization strategy and enhances cross-branch guidance with in-place distillation. Across ImageNet and CIFAR benchmarks, MBQuant delivers consistent accuracy gains over prior arbitrary bit-width methods while maintaining comparable or lower storage costs. The approach demonstrates practical potential for adaptable, resource-aware quantization in real-world deployments, supported by public code.

Abstract

Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight and activations bit-widths, leading to limited performance. To address this issue, we propose MBQuant, a novel method that utilizes a multi-branch topology for arbitrary bit-width quantization. MBQuant duplicates the network body into multiple independent branches, where the weights of each branch are quantized to a fixed 2-bit and the activations remain in the input bit-width. The computation of a desired bit-width is completed by selecting an appropriate number of branches that satisfy the original computational constraint. By fixing the weight bit-width, this approach substantially reduces quantization errors caused by switching weight bit-widths. Additionally, we introduce an amortization branch selection strategy to distribute quantization errors caused by switching activation bit-widths among branches to improve performance. Finally, we adopt an in-place distillation strategy that facilitates guidance between branches to further enhance MBQuant's performance. Extensive experiments demonstrate that MBQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods. Code is at https://github.com/zysxmu/MultiQuant.

MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

TL;DR

MBQuant tackles arbitrary bit-width QNNs by eliminating weight-switching error with a fixed-2-bit, multi-branch architecture that realizes any target bit-width through branch selection. It further mitigates activation-quantization error via an amortization strategy and enhances cross-branch guidance with in-place distillation. Across ImageNet and CIFAR benchmarks, MBQuant delivers consistent accuracy gains over prior arbitrary bit-width methods while maintaining comparable or lower storage costs. The approach demonstrates practical potential for adaptable, resource-aware quantization in real-world deployments, supported by public code.

Abstract

Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight and activations bit-widths, leading to limited performance. To address this issue, we propose MBQuant, a novel method that utilizes a multi-branch topology for arbitrary bit-width quantization. MBQuant duplicates the network body into multiple independent branches, where the weights of each branch are quantized to a fixed 2-bit and the activations remain in the input bit-width. The computation of a desired bit-width is completed by selecting an appropriate number of branches that satisfy the original computational constraint. By fixing the weight bit-width, this approach substantially reduces quantization errors caused by switching weight bit-widths. Additionally, we introduce an amortization branch selection strategy to distribute quantization errors caused by switching activation bit-widths among branches to improve performance. Finally, we adopt an in-place distillation strategy that facilitates guidance between branches to further enhance MBQuant's performance. Extensive experiments demonstrate that MBQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods. Code is at https://github.com/zysxmu/MultiQuant.
Paper Structure (19 sections, 11 equations, 3 figures, 7 tables)

This paper contains 19 sections, 11 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Illustration of the framework of (a) previous methods yu2021anyjin2020adabits. (b) our MBQuant.
  • Figure 2: Illustration of the (a) serial branch selection strategy. (b) our amortization branch selection strategy.
  • Figure 3: Influence of the amortization branch selection strategy and in-place distillation on the top-1 accuracy of ResNet-20 on CIFAR-100. "w/o" indicates without, "selection" indicates "amortization branch selection strategy", and "In-place" denotes "in-place distillation".