Table of Contents
Fetching ...

Quantized Convolutional Neural Networks Through the Lens of Partial Differential Equations

Ido Ben-Yair, Gil Ben Shalom, Moshe Eliasof, Eran Treister

TL;DR

This work investigates quantized neural networks through a partial differential equation (PDE) lens, introducing total-variation (TV) based edge-aware smoothing and forward-stable architectures to mitigate quantization noise. By treating activation error as diffusion-like noise and enforcing stability via symmetric dynamics, the authors design TV-augmented networks and stable variants of ResNet, MobileNetV2, and PDE-GCNs. Empirical results across image classification and semi-supervised graph tasks show that stable, quantized models can achieve comparable accuracy to full-precision networks while using fewer parameters, with TV layers improving quantization fidelity. The findings suggest that PDE-inspired stability enhances reliability and efficiency for edge and real-time deployments, including autonomous driving scenarios.

Abstract

Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs, especially on low-resource edge devices. However, fixed-point arithmetic is not natural to the type of computations involved in neural networks. In this work, we explore ways to improve quantized CNNs using PDE-based perspective and analysis. First, we harness the total variation (TV) approach to apply edge-aware smoothing to the feature maps throughout the network. This aims to reduce outliers in the distribution of values and promote piece-wise constant maps, which are more suitable for quantization. Secondly, we consider symmetric and stable variants of common CNNs for image classification, and Graph Convolutional Networks (GCNs) for graph node-classification. We demonstrate through several experiments that the property of forward stability preserves the action of a network under different quantization rates. As a result, stable quantized networks behave similarly to their non-quantized counterparts even though they rely on fewer parameters. We also find that at times, stability even aids in improving accuracy. These properties are of particular interest for sensitive, resource-constrained, low-power or real-time applications like autonomous driving.

Quantized Convolutional Neural Networks Through the Lens of Partial Differential Equations

TL;DR

This work investigates quantized neural networks through a partial differential equation (PDE) lens, introducing total-variation (TV) based edge-aware smoothing and forward-stable architectures to mitigate quantization noise. By treating activation error as diffusion-like noise and enforcing stability via symmetric dynamics, the authors design TV-augmented networks and stable variants of ResNet, MobileNetV2, and PDE-GCNs. Empirical results across image classification and semi-supervised graph tasks show that stable, quantized models can achieve comparable accuracy to full-precision networks while using fewer parameters, with TV layers improving quantization fidelity. The findings suggest that PDE-inspired stability enhances reliability and efficiency for edge and real-time deployments, including autonomous driving scenarios.

Abstract

Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs, especially on low-resource edge devices. However, fixed-point arithmetic is not natural to the type of computations involved in neural networks. In this work, we explore ways to improve quantized CNNs using PDE-based perspective and analysis. First, we harness the total variation (TV) approach to apply edge-aware smoothing to the feature maps throughout the network. This aims to reduce outliers in the distribution of values and promote piece-wise constant maps, which are more suitable for quantization. Secondly, we consider symmetric and stable variants of common CNNs for image classification, and Graph Convolutional Networks (GCNs) for graph node-classification. We demonstrate through several experiments that the property of forward stability preserves the action of a network under different quantization rates. As a result, stable quantized networks behave similarly to their non-quantized counterparts even though they rely on fewer parameters. We also find that at times, stability even aids in improving accuracy. These properties are of particular interest for sensitive, resource-constrained, low-power or real-time applications like autonomous driving.

Paper Structure

This paper contains 24 sections, 29 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: An example of a signal that has been uniformly quantized to 4 bits per sample. The original histogram is given in (a). The values are then clipped to the range $[-\alpha,\alpha]$ (in this example $\alpha=0.5$), multiplied by $7/\alpha$ ($7=2^{4-1}-1$, for 4 bits), quantized to integer values in (b), and scaled back to their original values in (c) by multiplying with the reciprocal $\alpha/7$.
  • Figure 2: An example of a feature map from the 4-th layer of the ResNet50 encoder for an image. (a) and (b) show the feature map before and after 3 iterations of the TV-smoothing operator in Eq. \ref{['eq:smooting_layer']}, with $\gamma^2=0.1$. It is evident that fine details are preserved after the rather light smoothing. (c) shows the corresponding value distributions. It is evident that the TV-smoothing eliminates outliers in addition to denoising the image a bit. After a 4-bit signed quantization with $\alpha = 0.5$, the MSE between the original and quantized maps are 0.16 and 0.05 for the original and TV-smoothed maps, respectively. Hence, we see that the TV-smoothing outputs a feature map that is better suited for quantization. The distributions in (c) have been smoothed slightly for improved visual clarity.
  • Figure 3: Per-layer MSE between activation maps of symmetric and non-symmetric network pairs. Each line represents a pair of networks where one has quantized activation maps and the other does not. The values are normalized per-layer to account for the different dimensions of each layer. In all cases, the symmetric variants (in red) exhibit a bounded divergence from full-precision activations, while the non-symmetric networks diverge as the information propagates through the layers (in blue). Hence, they are unstable. Top to bottom: ResNet56/CIFAR-10, ResNet56/CIFAR-100 and MobileNetV2/CIFAR-100. Both networks in each pair achieve comparable classification accuracy.