SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML
Ismail Lamaakal, Chaymae Yahyati, Khalid El Makkaoui, Ibrahim Ouahbi, Yassine Maleh
TL;DR
SNAP-UQ tackles uncertainty estimation in TinyML by introducing a self-supervised, depth-wise next-activation predictor that attaches tiny int8 heads to a few backbone layers. It computes per-layer standardized surprisal under a diagonal Gaussian and aggregates these into a single depth-wise energy S(x), which is mapped to an actionable uncertainty U(x) via a monotone calibrator, enabling single-pass, state-free inference with no buffers. The method is MCU-friendly, adds only tens of kilobytes of flash, and demonstrates strong performance on corrupted streams, CID/OOD detection, and ID calibration across vision and audio backbones, often outperforming heavier baselines under tight TinyML budgets. By tying uncertainty to internal feature dynamics rather than solely output confidence, SNAP-UQ offers a robust, deployable monitoring signal for on-device robustness and safe fallback in resource-constrained environments.
Abstract
This paper proposes a novel and practical method, SNAP-UQ, for single-pass, label-free uncertainty estimation based on depth-wise next-activation prediction. SNAP-UQ taps a small set of backbone layers and uses tiny int8 heads to predict the mean and scale of the next activation from a low-rank projection of the previous one; the resulting standardized prediction error forms a depth-wise surprisal signal that is aggregated and mapped through a lightweight monotone calibrator into an actionable uncertainty score. The design introduces no temporal buffers or auxiliary exits and preserves state-free inference, while increasing deployment footprint by only a few tens of kilobytes. Across vision and audio backbones, SNAP-UQ reduces flash and latency relative to early-exit and deep-ensemble baselines (typically $\sim$40--60% smaller and $\sim$25--35% faster), with several competing methods at similar accuracy often exceeding MCU memory limits. On corrupted streams, it improves accuracy-drop event detection by multiple AUPRC points and maintains strong failure detection (AUROC $\approx 0.9$) in a single forward pass. By grounding uncertainty in layer-to-layer dynamics rather than solely in output confidence, SNAP-UQ offers a novel, resource-efficient basis for robust TinyML monitoring.
