Table of Contents
Fetching ...

SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML

Ismail Lamaakal, Chaymae Yahyati, Khalid El Makkaoui, Ibrahim Ouahbi, Yassine Maleh

TL;DR

SNAP-UQ tackles uncertainty estimation in TinyML by introducing a self-supervised, depth-wise next-activation predictor that attaches tiny int8 heads to a few backbone layers. It computes per-layer standardized surprisal under a diagonal Gaussian and aggregates these into a single depth-wise energy S(x), which is mapped to an actionable uncertainty U(x) via a monotone calibrator, enabling single-pass, state-free inference with no buffers. The method is MCU-friendly, adds only tens of kilobytes of flash, and demonstrates strong performance on corrupted streams, CID/OOD detection, and ID calibration across vision and audio backbones, often outperforming heavier baselines under tight TinyML budgets. By tying uncertainty to internal feature dynamics rather than solely output confidence, SNAP-UQ offers a robust, deployable monitoring signal for on-device robustness and safe fallback in resource-constrained environments.

Abstract

This paper proposes a novel and practical method, SNAP-UQ, for single-pass, label-free uncertainty estimation based on depth-wise next-activation prediction. SNAP-UQ taps a small set of backbone layers and uses tiny int8 heads to predict the mean and scale of the next activation from a low-rank projection of the previous one; the resulting standardized prediction error forms a depth-wise surprisal signal that is aggregated and mapped through a lightweight monotone calibrator into an actionable uncertainty score. The design introduces no temporal buffers or auxiliary exits and preserves state-free inference, while increasing deployment footprint by only a few tens of kilobytes. Across vision and audio backbones, SNAP-UQ reduces flash and latency relative to early-exit and deep-ensemble baselines (typically $\sim$40--60% smaller and $\sim$25--35% faster), with several competing methods at similar accuracy often exceeding MCU memory limits. On corrupted streams, it improves accuracy-drop event detection by multiple AUPRC points and maintains strong failure detection (AUROC $\approx 0.9$) in a single forward pass. By grounding uncertainty in layer-to-layer dynamics rather than solely in output confidence, SNAP-UQ offers a novel, resource-efficient basis for robust TinyML monitoring.

SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML

TL;DR

SNAP-UQ tackles uncertainty estimation in TinyML by introducing a self-supervised, depth-wise next-activation predictor that attaches tiny int8 heads to a few backbone layers. It computes per-layer standardized surprisal under a diagonal Gaussian and aggregates these into a single depth-wise energy S(x), which is mapped to an actionable uncertainty U(x) via a monotone calibrator, enabling single-pass, state-free inference with no buffers. The method is MCU-friendly, adds only tens of kilobytes of flash, and demonstrates strong performance on corrupted streams, CID/OOD detection, and ID calibration across vision and audio backbones, often outperforming heavier baselines under tight TinyML budgets. By tying uncertainty to internal feature dynamics rather than solely output confidence, SNAP-UQ offers a robust, deployable monitoring signal for on-device robustness and safe fallback in resource-constrained environments.

Abstract

This paper proposes a novel and practical method, SNAP-UQ, for single-pass, label-free uncertainty estimation based on depth-wise next-activation prediction. SNAP-UQ taps a small set of backbone layers and uses tiny int8 heads to predict the mean and scale of the next activation from a low-rank projection of the previous one; the resulting standardized prediction error forms a depth-wise surprisal signal that is aggregated and mapped through a lightweight monotone calibrator into an actionable uncertainty score. The design introduces no temporal buffers or auxiliary exits and preserves state-free inference, while increasing deployment footprint by only a few tens of kilobytes. Across vision and audio backbones, SNAP-UQ reduces flash and latency relative to early-exit and deep-ensemble baselines (typically 40--60% smaller and 25--35% faster), with several competing methods at similar accuracy often exceeding MCU memory limits. On corrupted streams, it improves accuracy-drop event detection by multiple AUPRC points and maintains strong failure detection (AUROC ) in a single forward pass. By grounding uncertainty in layer-to-layer dynamics rather than solely in output confidence, SNAP-UQ offers a novel, resource-efficient basis for robust TinyML monitoring.

Paper Structure

This paper contains 147 sections, 3 theorems, 5 equations, 12 figures, 31 tables, 5 algorithms.

Key Result

Proposition 2.1

If $p_\theta(a_\ell\mid a_{\ell-1})=\mathcal{N}(\mu_\ell,\mathrm{diag}(\sigma_\ell^2))$ as above, then $-\log p_\theta(a_\ell\mid a_{\ell-1})=\tfrac{1}{2}\,e_\ell(x)+\tfrac{1}{2}\sum_{i=1}^{d_\ell}\log \sigma_{\ell,i}^2 + \text{const}$, so $S(x)$ is (up to additive/multiplicative constants and layer

Figures (12)

  • Figure 1: SNAP-UQ pipeline. A standard backbone $f_1,\dots,f_D$ is tapped at a small set of layers to expose activations $a_\ell$. For each tap, a lightweight projector $P_\ell a_{\ell-1}$ feeds a tiny predictor $g_\ell$ that outputs next-activation statistics $(\mu_\ell,\log\sigma_\ell^{2})$. The per-layer surprisal $e_\ell$ is a standardized squared error between the realized $a_\ell$ and $\mu_\ell$ (diagonal scale $\sigma_\ell^{2}$), and the single-pass uncertainty proxy aggregates these as $S(x)=\sum_{\ell} e_\ell$. A monotone logistic head maps $S(x)$ to a calibrated score $U(x)\!\in\![0,1]$; optionally, the head can blend instantaneous classifier confidence signals (maximum probability $C_\phi$ and margin $m^{\mathrm{mg}}$) for added separability. Dashed boxes indicate training-only steps where $\{g_\ell\}$ are fitted offline; inference remains one forward pass, state-free, and MCU-friendly.
  • Figure 2: CIFAR-10-C: AUPRC vs. corruption severity. SNAP-UQ scales fastest with severity.
  • Figure 3: Rank sensitivity. Accuracy-drop improves with rank; latency impact is small (CIFAR-10/Big-MCU).
  • Figure 4: Risk--coverage (CIFAR-10-C). Isotonic improves budgeted operation Angelopoulos2021RAPS.
  • Figure 5: Risk--coverage on two datasets (lower is better). SNAP dominates at moderate to high coverage.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Proposition 2.1: Surprisal--likelihood equivalence under diagonal-Gaussian
  • Proposition 2.2: Relation to Mahalanobis scores
  • Proposition 2.3: Affine invariance (scale)