Table of Contents
Fetching ...

Assessing the Potential for Catastrophic Failure in Dynamic Post-Training Quantization

Logan Frank, Paul Ardis

TL;DR

This paper addresses the risk of catastrophic failure when applying dynamic post-training quantization (DPTQ) to neural networks in safety-critical settings. It introduces a knowledge-distillation and reinforcement-learning framework with quantization-aware training to learn robust (f_R, pi_R) and detrimental (f_D, pi_D) network-policy pairs under a fixed total bit-width budget, enabling worst-case analysis. The authors demonstrate the existence of brittle policies that cause large accuracy degradations under quantization, identify transitory input points where failures are likely, and perform layer-wise and robustness analyses to reveal vulnerability patterns. The work highlights the necessity of safety-conscious robustness evaluation for quantization techniques and motivates further study into failure modes under adaptive quantization in real-world deployments.

Abstract

Post-training quantization (PTQ) has recently emerged as an effective tool for reducing the computational complexity and memory usage of a neural network by representing its weights and activations with lower precision. While this paradigm has shown great success in lowering compute and storage costs, there is the potential for drastic performance reduction depending upon the distribution of inputs experienced in inference. When considering possible deployment in safety-critical environments, it is important to investigate the extent of potential performance reduction, and what characteristics of input distributions may give rise to this reduction. In this work, we explore the idea of extreme failure stemming from dynamic PTQ and formulate a knowledge distillation and reinforcement learning task to learn a network and bit-width policy pair such that catastrophic failure under quantization is analyzed in terms of worst case potential. Our results confirm the existence of this "detrimental" network-policy pair, with several instances demonstrating performance reductions in the range of 10-65% in accuracy, compared to their "robust" counterparts encountering a <2% decrease. From systematic experimentation and analyses, we also provide an initial exploration into points at highest vulnerability. While our results represent an initial step toward understanding failure cases introduced by PTQ, our findings ultimately emphasize the need for caution in real-world deployment scenarios. We hope this work encourages more rigorous examinations of robustness and a greater emphasis on safety considerations for future works within the broader field of deep learning.

Assessing the Potential for Catastrophic Failure in Dynamic Post-Training Quantization

TL;DR

This paper addresses the risk of catastrophic failure when applying dynamic post-training quantization (DPTQ) to neural networks in safety-critical settings. It introduces a knowledge-distillation and reinforcement-learning framework with quantization-aware training to learn robust (f_R, pi_R) and detrimental (f_D, pi_D) network-policy pairs under a fixed total bit-width budget, enabling worst-case analysis. The authors demonstrate the existence of brittle policies that cause large accuracy degradations under quantization, identify transitory input points where failures are likely, and perform layer-wise and robustness analyses to reveal vulnerability patterns. The work highlights the necessity of safety-conscious robustness evaluation for quantization techniques and motivates further study into failure modes under adaptive quantization in real-world deployments.

Abstract

Post-training quantization (PTQ) has recently emerged as an effective tool for reducing the computational complexity and memory usage of a neural network by representing its weights and activations with lower precision. While this paradigm has shown great success in lowering compute and storage costs, there is the potential for drastic performance reduction depending upon the distribution of inputs experienced in inference. When considering possible deployment in safety-critical environments, it is important to investigate the extent of potential performance reduction, and what characteristics of input distributions may give rise to this reduction. In this work, we explore the idea of extreme failure stemming from dynamic PTQ and formulate a knowledge distillation and reinforcement learning task to learn a network and bit-width policy pair such that catastrophic failure under quantization is analyzed in terms of worst case potential. Our results confirm the existence of this "detrimental" network-policy pair, with several instances demonstrating performance reductions in the range of 10-65% in accuracy, compared to their "robust" counterparts encountering a <2% decrease. From systematic experimentation and analyses, we also provide an initial exploration into points at highest vulnerability. While our results represent an initial step toward understanding failure cases introduced by PTQ, our findings ultimately emphasize the need for caution in real-world deployment scenarios. We hope this work encourages more rigorous examinations of robustness and a greater emphasis on safety considerations for future works within the broader field of deep learning.

Paper Structure

This paper contains 12 sections, 11 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of our complete pipeline. (A) Performing knowledge distillation on a black box network to obtain a white box model. (B) Training a quantization-aware student and bit-width policy via knowledge distillation and reinforcement learning. (C) Training the same student to retain knowledge at full-precision. Symbols are defined in text.
  • Figure 2: Train and test accuracy of the highlight models from Table \ref{['tab:main_policies']} when certain layers are quantized and others are left at full-precision. "Before" quantizes all layers before that point (the relative layer on the x-axis), "after" quantizes all layers after that point, and "single" quantizes just that single layer. All other layer activations remain at full-precision.
  • Figure 3: Histogram of layer activation values (with the bin corresponding to $0$ and near-$0$ values removed) before and after quantization for the highlighted ResNet18 models in Table \ref{['tab:main_policies']}, corresponding to the FP and Q rows, respectively.
  • Figure 4: A generalization/robustness analysis for the highlighted ResNet18 models from Table \ref{['tab:main_policies']} when transforming/corrupting the test set images with common operations. Degree and test accuracy are the $x$ and $y$ axes, respectively.