Table of Contents
Fetching ...

QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring

Nikhil P Ghanathe, Steven J E Wilton

TL;DR

QUTE is proposed, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models that delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work.

Abstract

Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized TinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work. When deployed on a microcontroller, QUTE demonstrates a 31% reduction in latency on average. In addition, we show that QUTE excels at detecting accuracy-drop events, outperforming all prior works.

QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring

TL;DR

QUTE is proposed, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models that delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work.

Abstract

Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized TinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work. When deployed on a microcontroller, QUTE demonstrates a 31% reduction in latency on average. In addition, we show that QUTE excels at detecting accuracy-drop events, outperforming all prior works.
Paper Structure (28 sections, 6 equations, 7 figures, 12 tables)

This paper contains 28 sections, 6 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: QUTE architecture. $\{f_{out}\}_{k=1}^{K}$ represents the 'K' additional output blocks at the final exit, which are assisted by 'K' early-exit blocks $\{g_{\theta_k}\}_{k=1}^{K}$only during training to promote diversity (see Figure \ref{['fig:EV-assist']}). For inference, all (early-exits & $f_{out}$) are removed
  • Figure 2: Early-view-assistance architecture. One assisting early-exit $g_{\theta_k}$ and corresponding early-view-assisted exit $f_{out_k}$ is shown. Early-exits weights $\theta_k$ are transferred/copied to $h_{\phi_k}$ before each train batch, i.e., $\phi_k = \theta_k$
  • Figure 3: Microcontroller results for SpeechCmd (sp-cmd), CIFAR10 (cfr10) on Big-MCU and Small-MCU (lower is better). EE-ensemble and DEEP on CIFAR10 do not fit i.e., out-of-memory (OOM)
  • Figure 4: Accuracy-drop detection results. AUPRC is reported for all evaluated baselines (higher is better)
  • Figure 5: Batch-loss at EV-0 for CIFAR10 on Resnet-8
  • ...and 2 more figures