Table of Contents
Fetching ...

On the impact of the parametrization of deep convolutional neural networks on post-training quantization

Samy Houache, Jean François Aujol, Yann Traonmilin

TL;DR

This paper tackles the challenge of guaranteeing performance when post-training quantization is applied to CNNs. It develops a theory that replaces the depth-heavy worst-case bound with a layerwise mean norm, introducing the mean norm $r_{mean}$ and showing that $\sup_{x\in\Omega} \|R_\theta(x) - R_{\theta'}(x)\|_{\infty}$ scales as $\max(D,1) (r_{mean})^{L-1} \sum_{\ell} N_{\ell-1} \|\theta-\theta'\|_\infty$, with CNN-specific refinements using $r_{conv}$ and $p_l^2 c_{l-1}$. The results relax earlier assumptions (arbitrary $r_\ell$) and provide far tighter bounds than prior work, validated on pretrained ResNet and MobileNetV2 models and multiple quantization schemes. The practical impact is twofold: it explains why weight quantization can work well in real networks and guides preprocessing steps like cross-layer equalization to tighten the bounds further. The work also outlines future directions toward Transformers and probabilistic analyses to complement the deterministic guarantees.

Abstract

This paper introduces novel theoretical approximation bounds for the output of quantized neural networks, with a focus on convolutional neural networks (CNN). By considering layerwise parametrization and focusing on the quantization of weights, we provide bounds that gain several orders of magnitude compared to state-of-the-art results on classical deep convolutional neural networks such as MobileNetV2 or ResNets. These gains are achieved by improving the behaviour of the approximation bounds with respect to the depth parameter, which has the most impact on the approximation error induced by quantization. To complement our theoretical result, we provide a numerical exploration of our bounds on MobileNetV2 and ResNets.

On the impact of the parametrization of deep convolutional neural networks on post-training quantization

TL;DR

This paper tackles the challenge of guaranteeing performance when post-training quantization is applied to CNNs. It develops a theory that replaces the depth-heavy worst-case bound with a layerwise mean norm, introducing the mean norm and showing that scales as , with CNN-specific refinements using and . The results relax earlier assumptions (arbitrary ) and provide far tighter bounds than prior work, validated on pretrained ResNet and MobileNetV2 models and multiple quantization schemes. The practical impact is twofold: it explains why weight quantization can work well in real networks and guides preprocessing steps like cross-layer equalization to tighten the bounds further. The work also outlines future directions toward Transformers and probabilistic analyses to complement the deterministic guarantees.

Abstract

This paper introduces novel theoretical approximation bounds for the output of quantized neural networks, with a focus on convolutional neural networks (CNN). By considering layerwise parametrization and focusing on the quantization of weights, we provide bounds that gain several orders of magnitude compared to state-of-the-art results on classical deep convolutional neural networks such as MobileNetV2 or ResNets. These gains are achieved by improving the behaviour of the approximation bounds with respect to the depth parameter, which has the most impact on the approximation error induced by quantization. To complement our theoretical result, we provide a numerical exploration of our bounds on MobileNetV2 and ResNets.

Paper Structure

This paper contains 23 sections, 9 theorems, 105 equations, 12 figures, 1 table.

Key Result

Theorem 3.5

For any architecture $(L,\mathbf{N})$, and any $r \geq 1$, denoting $N := \max_{l=0,\ldots,L} N_l$, for any $\theta,\theta' \in \Theta_{L,\mathbf{N}}(r)$, we have :

Figures (12)

  • Figure 1: Illustration of the improvement, in log scale, over the previous bound equation \ref{['eq:orig_bound1']} on ResNet18 without BatchNorm and without biases, with respect to the number of quantization bits, showing a $10^{8}$ times tighter error estimation.
  • Figure 2: Comparison between the maximum geometric mean term $r_{\text{conv}}$ (green) used in equation \ref{['eq:th_conv']} and the maximum weight norm $r$ (red) used in equation \ref{['eq:orig_bound1']}, for ResNet18, ResNet50 and MobileNetV2, without BatchNorm, showing a smaller value of $r_{conv}$ for all models.
  • Figure 3: Comparison in log scale between our bound equation \ref{['eq:th_conv']} and the previous bound equation \ref{['eq:orig_bound1']}, on the convolutional part of (a) MobileNetV2 and (b) ResNet50, with respect to the number of bits. Our bound is approximately $10^{56}$ times tighter for MobileNetV2 and $10^{27}$ times tighter for ResNet50.
  • Figure 4: Graphs illustrating the effect of quantization on performance on Tiny ImageNet for three quantization functions (round, uniform and Adaround). The results highlight how quantization reduces memory requirements while maintaining or approaching the base model's accuracy. The amount of quantization needed to reach the base precision depends on the quantization function used.
  • Figure 5: Comparison for MLPs of depths 5, 7, 9 and 11 on MNIST. (a) shows how the ratio of our bound over the previous bound grows exponentially with depth, and (b) demonstrates that our bound reduces that exponential dependence across bit-widths.
  • ...and 7 more figures

Theorems & Definitions (24)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4
  • Theorem 3.5: Previous bound from gonon2023approximation
  • Theorem 4.1: General approximation bound
  • Remark 4.2: Improved factor $\sum_{l=1}^L N_{l-1}$
  • Remark 4.3: Weakened condition for the domain of parameters $r_\ell$
  • Theorem 4.4: Approximation bound for CNN
  • Lemma A.1
  • ...and 14 more