Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions

Tahmid Hasan Prato; Seijoon Kim; Lizhong Chen; Sanghyun Hong

Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions

Tahmid Hasan Prato, Seijoon Kim, Lizhong Chen, Sanghyun Hong

TL;DR

The paper tackles the vulnerability of DNNs to parameter bit-flips by introducing Hessian-aware training, which minimizes the Hessian trace $Tr(H)$ to flatten the loss surface and reduce sensitivity to parameter perturbations. It employs Hutchinson's method to approximate the trace and uses Top-$p$ eigenvalues (with $p$ around 50) to stabilize training, incorporating regularization into SGD-based optimization. Across MNIST, CIFAR-10, and ImageNet, the method reduces the fraction of erratic parameters by 6–12% and increases the number of bit-flips required for adversaries to induce large accuracy drops by 2–3×, while preserving baseline accuracy. The approach shows strong synergy with hardware defenses (e.g., NeuroPot, RADAR) and remains practical for large models via layer-sampling strategies that keep overhead reasonable, enabling more robust deployment on error-prone hardware.

Abstract

Deep neural networks are not resilient to parameter corruptions: even a single-bitwise error in their parameters in memory can cause an accuracy drop of over 10%, and in the worst cases, up to 99%. This susceptibility poses great challenges in deploying models on computing platforms, where adversaries can induce bit-flips through software or bitwise corruptions may occur naturally. Most prior work addresses this issue with hardware or system-level approaches, such as integrating additional hardware components to verify a model's integrity at inference. However, these methods have not been widely deployed as they require infrastructure or platform-wide modifications. In this paper, we propose a new approach to addressing this issue: training models to be more resilient to bitwise corruptions to their parameters. Our approach, Hessian-aware training, promotes models with $flatter$ loss surfaces. We show that, while there have been training methods, designed to improve generalization through Hessian-based approaches, they do not enhance resilience to parameter corruptions. In contrast, models trained with our method demonstrate increased resilience to parameter corruptions, particularly with a 20$-$50% reduction in the number of bits whose individual flipping leads to a 90$-$100% accuracy drop. Moreover, we show the synergy between ours and existing hardware and system-level defenses.

Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions

TL;DR

The paper tackles the vulnerability of DNNs to parameter bit-flips by introducing Hessian-aware training, which minimizes the Hessian trace

to flatten the loss surface and reduce sensitivity to parameter perturbations. It employs Hutchinson's method to approximate the trace and uses Top-

eigenvalues (with

around 50) to stabilize training, incorporating regularization into SGD-based optimization. Across MNIST, CIFAR-10, and ImageNet, the method reduces the fraction of erratic parameters by 6–12% and increases the number of bit-flips required for adversaries to induce large accuracy drops by 2–3×, while preserving baseline accuracy. The approach shows strong synergy with hardware defenses (e.g., NeuroPot, RADAR) and remains practical for large models via layer-sampling strategies that keep overhead reasonable, enabling more robust deployment on error-prone hardware.

Abstract

loss surfaces. We show that, while there have been training methods, designed to improve generalization through Hessian-based approaches, they do not enhance resilience to parameter corruptions. In contrast, models trained with our method demonstrate increased resilience to parameter corruptions, particularly with a 20

50% reduction in the number of bits whose individual flipping leads to a 90

100% accuracy drop. Moreover, we show the synergy between ours and existing hardware and system-level defenses.

Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions

TL;DR

Abstract

Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)