QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring

Nikhil P Ghanathe; Steven J E Wilton

QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring

Nikhil P Ghanathe, Steven J E Wilton

TL;DR

QUTE is proposed, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models that delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work.

Abstract

Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized TinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work. When deployed on a microcontroller, QUTE demonstrates a 31% reduction in latency on average. In addition, we show that QUTE excels at detecting accuracy-drop events, outperforming all prior works.

QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring

TL;DR

Abstract

Paper Structure (28 sections, 6 equations, 7 figures, 12 tables)

This paper contains 28 sections, 6 equations, 7 figures, 12 tables.

Introduction
Related Work
Background and Problem formulation
QUTE
Evaluation Methodology
Results
MCU Fit: QUTE vs Resource-Heavy methods
Accuracy-drop detection
Failure detection
Uncertainty quantification
Effectiveness of EV-assistance method and its effect on convergence
Comparison with Temperature scaling
Comparison with single-pass deterministic methods
Conclusion and Discussion
Appendix
...and 13 more sections

Figures (7)

Figure 1: QUTE architecture. $\{f_{out}\}_{k=1}^{K}$ represents the 'K' additional output blocks at the final exit, which are assisted by 'K' early-exit blocks $\{g_{\theta_k}\}_{k=1}^{K}$only during training to promote diversity (see Figure \ref{['fig:EV-assist']}). For inference, all (early-exits & $f_{out}$) are removed
Figure 2: Early-view-assistance architecture. One assisting early-exit $g_{\theta_k}$ and corresponding early-view-assisted exit $f_{out_k}$ is shown. Early-exits weights $\theta_k$ are transferred/copied to $h_{\phi_k}$ before each train batch, i.e., $\phi_k = \theta_k$
Figure 3: Microcontroller results for SpeechCmd (sp-cmd), CIFAR10 (cfr10) on Big-MCU and Small-MCU (lower is better). EE-ensemble and DEEP on CIFAR10 do not fit i.e., out-of-memory (OOM)
Figure 4: Accuracy-drop detection results. AUPRC is reported for all evaluated baselines (higher is better)
Figure 5: Batch-loss at EV-0 for CIFAR10 on Resnet-8
...and 2 more figures

QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring

TL;DR

Abstract

QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring

Authors

TL;DR

Abstract

Table of Contents

Figures (7)