How Reliable is Your Service at the Extreme Edge? Analytical Modeling of Computational Reliability
MHD Saria Allahham, Hossam S. Hassanein
TL;DR
This work addresses quantifying the reliability of AI inference streaming on consumer-owned devices in extreme edge computing by defining reliability as the probability that instantaneous capacity exceeds demand at a QoS threshold $\Theta$. It develops two regimes, Minimal Information (MI) with uniform bounds and Historical Data (MLE-based) with truncated normals, and derives closed-form single-device reliability alongside extensions to series, parallel, and partitioned multi-device configurations. It also introduces optimal workload partitioning via equal marginal log-reliability and device-selection bounds to achieve target reliability $\varepsilon$. The framework is validated using real-time object detection with YOLO11m, showing strong agreement with Monte Carlo and empirical measurements, and demonstrates meaningful throughput gains from distributed configurations. Overall, the approach provides tractable tools for orchestrators to assess feasibility, allocate work, and select devices for reliable distributed streaming at the extreme edge.
Abstract
Extreme Edge Computing (XEC) distributes streaming workloads across consumer-owned devices, exploiting their proximity to users and ubiquitous availability. Many such workloads are AI-driven, requiring continuous neural network inference for tasks like object detection and video analytics. Distributed Inference (DI), which partitions model execution across multiple edge devices, enables these streaming services to meet strict throughput and latency requirements. Yet consumer devices exhibit volatile computational availability due to competing applications and unpredictable usage patterns. This volatility poses a fundamental challenge: how can we quantify the probability that a device, or ensemble of devices, will maintain the processing rate required by a streaming service? This paper presents an analytical framework for computational reliability in XEC, defined as the probability that instantaneous capacity meets demand at a specified Quality of Service (QoS) threshold. We derive closed-form reliability expressions under two information regimes: Minimal Information (MI), requiring only declared operational bounds, and historical data, which refines estimates via Maximum Likelihood Estimation from past observations. The framework extends to multi-device deployments, providing reliability expressions for series, parallel, and partitioned workload configurations. We derive optimal workload allocation rules and analytical bounds for device selection, equipping orchestrators with tractable tools to evaluate deployment feasibility and configure distributed streaming systems. We validate the framework using real-time object detection with YOLO11m model as a representative DI streaming workload; experiments on emulated XED environments demonstrate close agreement between analytical predictions, Monte Carlo sampling, and empirical measurements across diverse capacity and demand configurations.
