Table of Contents
Fetching ...

Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights

Soyed Tuhin Ahmed, Michael Hefenbrock, Mehdi B. Tahoori

TL;DR

Tiny-DE addresses the critical need for uncertainty estimation on edge AI by ensembling only normalization layers with shared backbone weights, enabling single-pass inference and training. The EnsembleNorm mechanism allows per-member statistics and affine parameters, delivering diverse internal states without the cost of full-model ensembles. Empirical results across classification, regression, time-series, and segmentation show competitive accuracy and improved OoD uncertainty, with hardware overhead close to a single model and substantial reductions relative to naive ensembles. This approach offers practical impact for reliable, low-latency AI on battery-powered devices and specialized accelerators.

Abstract

The applications of artificial intelligence (AI) are rapidly evolving, and they are also commonly used in safety-critical domains, such as autonomous driving and medical diagnosis, where functional safety is paramount. In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. Therefore, the robustness and reliability of model predictions can be improved. However, conventional uncertainty estimation methods, such as the deep ensemble method, impose high computation and, accordingly, hardware (latency and energy) overhead because they require the storage and processing of multiple models. Alternatively, Monte Carlo dropout (MC-dropout) methods, although having low memory overhead, necessitate numerous ($\sim 100$) forward passes, leading to high computational overhead and latency. Thus, these approaches are not suitable for battery-powered edge devices with limited computing and memory resources. In this paper, we propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices. In our approach, only normalization layers are ensembled $M$ times, with all ensemble members sharing common weights and biases, leading to a significant decrease in storage requirements and latency. Moreover, our approach requires only one forward pass in a hardware architecture that allows batch processing for inference and uncertainty estimation. Furthermore, it has approximately the same memory overhead compared to a single model. Therefore, latency and memory overhead are reduced by a factor of up to $\sim M\times$. Nevertheless, our method does not compromise accuracy, with an increase in inference accuracy of up to $\sim 1\%$ and a reduction in RMSE of $17.17\%$ in various benchmark datasets, tasks, and state-of-the-art architectures.

Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights

TL;DR

Tiny-DE addresses the critical need for uncertainty estimation on edge AI by ensembling only normalization layers with shared backbone weights, enabling single-pass inference and training. The EnsembleNorm mechanism allows per-member statistics and affine parameters, delivering diverse internal states without the cost of full-model ensembles. Empirical results across classification, regression, time-series, and segmentation show competitive accuracy and improved OoD uncertainty, with hardware overhead close to a single model and substantial reductions relative to naive ensembles. This approach offers practical impact for reliable, low-latency AI on battery-powered devices and specialized accelerators.

Abstract

The applications of artificial intelligence (AI) are rapidly evolving, and they are also commonly used in safety-critical domains, such as autonomous driving and medical diagnosis, where functional safety is paramount. In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. Therefore, the robustness and reliability of model predictions can be improved. However, conventional uncertainty estimation methods, such as the deep ensemble method, impose high computation and, accordingly, hardware (latency and energy) overhead because they require the storage and processing of multiple models. Alternatively, Monte Carlo dropout (MC-dropout) methods, although having low memory overhead, necessitate numerous () forward passes, leading to high computational overhead and latency. Thus, these approaches are not suitable for battery-powered edge devices with limited computing and memory resources. In this paper, we propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices. In our approach, only normalization layers are ensembled times, with all ensemble members sharing common weights and biases, leading to a significant decrease in storage requirements and latency. Moreover, our approach requires only one forward pass in a hardware architecture that allows batch processing for inference and uncertainty estimation. Furthermore, it has approximately the same memory overhead compared to a single model. Therefore, latency and memory overhead are reduced by a factor of up to . Nevertheless, our method does not compromise accuracy, with an increase in inference accuracy of up to and a reduction in RMSE of in various benchmark datasets, tasks, and state-of-the-art architectures.
Paper Structure (23 sections, 3 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 23 sections, 3 equations, 10 figures, 3 tables, 2 algorithms.

Figures (10)

  • Figure 1: a) Deep Ensemble deep_ensemble with $M$ ensemble members , b) BatchEnsemble batchensemble, proposed Tiny-DE model with $M$ normalization layers with a single shared convolutional layer in c) serial mode, and d) parallel mode.
  • Figure 2: a) Number of parameters in each layer and b) Share of parameter groups with respect to the total number of parameters in ResNet-32.
  • Figure 3: Sketch of proposed Tiny-DE architecture based on popular CNN architectures ResNet he2016deep and VGG simonyan2014very. We only show the four signature layers of a specific topology. Our proposed topology is generalizable across existing topologies, with only the addition of a router before the normalization layers. In the case of our proposed approach in batch mode, no change is required in the topology.
  • Figure 4: Uncertainty distributions for the Tiny-DE approach on CIFAR-10, including ID CIFAR-10, and OOD datasets such as rotated CIFAR-10, SVHN, and STL. Notably, larger ensembles show increased relative change of uncertainty distribution from ID compared to a single model (M = 1).
  • Figure 5: ID and OoD Max Disagreement distributions for the Tiny-DE approach trained on clean CIFAR-100 (ID). Notably, larger ensembles show increased relative change of uncertainty distribution from ID.
  • ...and 5 more figures