Table of Contents
Fetching ...

Up or Down? Adaptive Rounding for Post-Training Quantization

Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort

TL;DR

This work tackles the limitations of rounding-to-nearest in post-training weight quantization by deriving a Taylor-series-based framework that treats per-layer weight rounding as a QUBO problem. To make it practical, the authors approximate the Hessian as diagonal, decompose the problem into layer-wise local losses, and solve via a continuous relaxation (AdaRound) with a rectified sigmoid and regularization, including asymmetric reconstruction to account for activation quantization. The method is data-efficient, requiring only unlabeled samples, and dramatically improves accuracy on 4-bit weight quantization across networks like ResNet-18/50, InceptionV3, MobilenetV2, and DeeplabV3+, often matching or exceeding FP32 performance without fine-tuning. Empirically, AdaRound outperforms bias correction and other PTQ methods on ImageNet and semantic segmentation benchmarks, establishing a new state-of-the-art in post-training weight quantization with strong robustness to data size and domain shifts. Overall, AdaRound offers a principled, scalable solution for deploying ultra-low-bit quantized networks on diverse hardware without re-training.

Abstract

When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.

Up or Down? Adaptive Rounding for Post-Training Quantization

TL;DR

This work tackles the limitations of rounding-to-nearest in post-training weight quantization by deriving a Taylor-series-based framework that treats per-layer weight rounding as a QUBO problem. To make it practical, the authors approximate the Hessian as diagonal, decompose the problem into layer-wise local losses, and solve via a continuous relaxation (AdaRound) with a rectified sigmoid and regularization, including asymmetric reconstruction to account for activation quantization. The method is data-efficient, requiring only unlabeled samples, and dramatically improves accuracy on 4-bit weight quantization across networks like ResNet-18/50, InceptionV3, MobilenetV2, and DeeplabV3+, often matching or exceeding FP32 performance without fine-tuning. Empirically, AdaRound outperforms bias correction and other PTQ methods on ImageNet and semantic segmentation benchmarks, establishing a new state-of-the-art in post-training weight quantization with strong robustness to data size and domain shifts. Overall, AdaRound offers a principled, scalable solution for deploying ultra-low-bit quantized networks on diverse hardware without re-training.

Abstract

When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.

Paper Structure

This paper contains 23 sections, 26 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Correlation between the cost in \ref{['eq:taylor_opt']} vs ImageNet validation accuracy (%) of 100 stochastic rounding vectors $\mathbf{\widehat{w}}$ for $4$-bit quantization of only the first layer of Resnet18.
  • Figure 2: Effect of annealing $b$ on regularization term \ref{['eq:reg_func']}.
  • Figure 3: Comparison of $\mathnormal{h}\left(\mathbf{V}_{i,j}\right)$ before (x-axis, corresponding to floating point weights) vs after (y-axis) optimizing \ref{['eq:mse_relax']}. We see that all $\mathnormal{h}\left(\mathbf{V}_{i,j}\right)$ have converged to $0$ or $1$. Top left and lower right quadrants indicate the weights that have different rounding using \ref{['eq:mse_relax']} vs rounding-to-nearest.
  • Figure 4: The effect on ImageNet validation accuracy when using different number of images belonging to different datasets for AdaRound optimization.

Theorems & Definitions (1)

  • Example 1