Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale

Soyed Tuhin Ahmed; Kamal Danouchi; Michael Hefenbrock; Guillaume Prenat; Lorena Anghel; Mehdi B. Tahoori

Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale

Soyed Tuhin Ahmed, Kamal Danouchi, Michael Hefenbrock, Guillaume Prenat, Lorena Anghel, Mehdi B. Tahoori

TL;DR

This work tackles the challenge of uncertainty estimation in neural networks on resource-constrained devices by introducing Scale-Dropout for Binary Neural Networks and a Monte Carlo Scale-Dropout-based Bayesian framework. The authors implement this approach on a spintronic CIM architecture using a single stochastic module, achieving over 100x energy savings and robust uncertainty estimates. Empirical results on CIFAR-10 and biomedical segmentation demonstrate competitive predictive accuracy and strong out-of-distribution detection, with improved uncertainty calibration and reduced Monte Carlo sampling requirements. The combination of a learnable scale vector, unitary dropout, and spintronic CIM yields a practical pathway for reliable, edge-friendly Bayesian inference in deep learning systems.

Abstract

Uncertainty estimation in Neural Networks (NNs) is vital in improving reliability and confidence in predictions, particularly in safety-critical applications. Bayesian Neural Networks (BayNNs) with Dropout as an approximation offer a systematic approach to quantifying uncertainty, but they inherently suffer from high hardware overhead in terms of power, memory, and computation. Thus, the applicability of BayNNs to edge devices with limited resources or to high-performance applications is challenging. Some of the inherent costs of BayNNs can be reduced by accelerating them in hardware on a Computation-In-Memory (CIM) architecture with spintronic memories and binarizing their parameters. However, numerous stochastic units are required to implement conventional dropout-based BayNN. In this paper, we propose the Scale Dropout, a novel regularization technique for Binary Neural Networks (BNNs), and Monte Carlo-Scale Dropout (MC-Scale Dropout)-based BayNNs for efficient uncertainty estimation. Our approach requires only one stochastic unit for the entire model, irrespective of the model size, leading to a highly scalable Bayesian NN. Furthermore, we introduce a novel Spintronic memory-based CIM architecture for the proposed BayNN that achieves more than $100\times$ energy savings compared to the state-of-the-art. We validated our method to show up to a $1\%$ improvement in predictive performance and superior uncertainty estimates compared to related works.

Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale

TL;DR

Abstract

energy savings compared to the state-of-the-art. We validated our method to show up to a

improvement in predictive performance and superior uncertainty estimates compared to related works.

Paper Structure (44 sections, 20 equations, 9 figures, 10 tables)

This paper contains 44 sections, 20 equations, 9 figures, 10 tables.

Introduction
Preliminaries
Binary NNs and Scaling
Uncertainty Estimation
Conventional Dropout Methods
Bayesian Neural Networks
Spintronic Memory Technology
Related Works
Proposed approach
Scale Vector
Scale Dropout Model Description
Co-adaptation Mitigation
Choosing Dropout Probability
Learning with Scale-Dropout
Scale-Dropout as a Bayesian Approximation
...and 29 more sections

Figures (9)

Figure 1: Several nodes (neurons) a) at training time that are scaled with a probability of $p$ and dropped (ignored) with a probability of $1-p$, b) At test time, if point estimate prediction is preferred, all the nodes are always scaled. However, for Bayesian inference, all nodes behave similarly to train time. Here, all the nodes are connected to the weights of the next layer after non-linear activation and Batch normalization, and their shapes represent scaling factors. All the dropped nodes have the same shape, indicating no scaling factor.
Figure 2: Spin Scale-Dropout Module based on SOT MTJ.
Figure 3: Binary SOT crossbar array for the Bayesian inference.
Figure 4: Proposed inference architecture for Scale-Dropout.
Figure 5: Detecting Distribution Shift on CIFAR-10: a) A scatter and b) 95% confidence interval of 100 forward passes of the softmax input (logits) and output for Scale-Dropout VGG topology. Uniform noise of increasing strength is added to a randomly sampled image of a ship (leveled as 8). The uncertainty of the prediction increases with the data distribution shift as shown by the high SoftMax scatter and the confidence interval. Although the model uncertainty is extremely high (best observed in color), the input for images 5 through 12 is classified as either a truck (leveled as 9) or a bird (leveled as 3).
...and 4 more figures

Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale

TL;DR

Abstract

Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale

Authors

TL;DR

Abstract

Table of Contents

Figures (9)