Table of Contents
Fetching ...

How Reliable is Your Service at the Extreme Edge? Analytical Modeling of Computational Reliability

MHD Saria Allahham, Hossam S. Hassanein

TL;DR

This work addresses quantifying the reliability of AI inference streaming on consumer-owned devices in extreme edge computing by defining reliability as the probability that instantaneous capacity exceeds demand at a QoS threshold $\Theta$. It develops two regimes, Minimal Information (MI) with uniform bounds and Historical Data (MLE-based) with truncated normals, and derives closed-form single-device reliability alongside extensions to series, parallel, and partitioned multi-device configurations. It also introduces optimal workload partitioning via equal marginal log-reliability and device-selection bounds to achieve target reliability $\varepsilon$. The framework is validated using real-time object detection with YOLO11m, showing strong agreement with Monte Carlo and empirical measurements, and demonstrates meaningful throughput gains from distributed configurations. Overall, the approach provides tractable tools for orchestrators to assess feasibility, allocate work, and select devices for reliable distributed streaming at the extreme edge.

Abstract

Extreme Edge Computing (XEC) distributes streaming workloads across consumer-owned devices, exploiting their proximity to users and ubiquitous availability. Many such workloads are AI-driven, requiring continuous neural network inference for tasks like object detection and video analytics. Distributed Inference (DI), which partitions model execution across multiple edge devices, enables these streaming services to meet strict throughput and latency requirements. Yet consumer devices exhibit volatile computational availability due to competing applications and unpredictable usage patterns. This volatility poses a fundamental challenge: how can we quantify the probability that a device, or ensemble of devices, will maintain the processing rate required by a streaming service? This paper presents an analytical framework for computational reliability in XEC, defined as the probability that instantaneous capacity meets demand at a specified Quality of Service (QoS) threshold. We derive closed-form reliability expressions under two information regimes: Minimal Information (MI), requiring only declared operational bounds, and historical data, which refines estimates via Maximum Likelihood Estimation from past observations. The framework extends to multi-device deployments, providing reliability expressions for series, parallel, and partitioned workload configurations. We derive optimal workload allocation rules and analytical bounds for device selection, equipping orchestrators with tractable tools to evaluate deployment feasibility and configure distributed streaming systems. We validate the framework using real-time object detection with YOLO11m model as a representative DI streaming workload; experiments on emulated XED environments demonstrate close agreement between analytical predictions, Monte Carlo sampling, and empirical measurements across diverse capacity and demand configurations.

How Reliable is Your Service at the Extreme Edge? Analytical Modeling of Computational Reliability

TL;DR

This work addresses quantifying the reliability of AI inference streaming on consumer-owned devices in extreme edge computing by defining reliability as the probability that instantaneous capacity exceeds demand at a QoS threshold . It develops two regimes, Minimal Information (MI) with uniform bounds and Historical Data (MLE-based) with truncated normals, and derives closed-form single-device reliability alongside extensions to series, parallel, and partitioned multi-device configurations. It also introduces optimal workload partitioning via equal marginal log-reliability and device-selection bounds to achieve target reliability . The framework is validated using real-time object detection with YOLO11m, showing strong agreement with Monte Carlo and empirical measurements, and demonstrates meaningful throughput gains from distributed configurations. Overall, the approach provides tractable tools for orchestrators to assess feasibility, allocate work, and select devices for reliable distributed streaming at the extreme edge.

Abstract

Extreme Edge Computing (XEC) distributes streaming workloads across consumer-owned devices, exploiting their proximity to users and ubiquitous availability. Many such workloads are AI-driven, requiring continuous neural network inference for tasks like object detection and video analytics. Distributed Inference (DI), which partitions model execution across multiple edge devices, enables these streaming services to meet strict throughput and latency requirements. Yet consumer devices exhibit volatile computational availability due to competing applications and unpredictable usage patterns. This volatility poses a fundamental challenge: how can we quantify the probability that a device, or ensemble of devices, will maintain the processing rate required by a streaming service? This paper presents an analytical framework for computational reliability in XEC, defined as the probability that instantaneous capacity meets demand at a specified Quality of Service (QoS) threshold. We derive closed-form reliability expressions under two information regimes: Minimal Information (MI), requiring only declared operational bounds, and historical data, which refines estimates via Maximum Likelihood Estimation from past observations. The framework extends to multi-device deployments, providing reliability expressions for series, parallel, and partitioned workload configurations. We derive optimal workload allocation rules and analytical bounds for device selection, equipping orchestrators with tractable tools to evaluate deployment feasibility and configure distributed streaming systems. We validate the framework using real-time object detection with YOLO11m model as a representative DI streaming workload; experiments on emulated XED environments demonstrate close agreement between analytical predictions, Monte Carlo sampling, and empirical measurements across diverse capacity and demand configurations.
Paper Structure (24 sections, 5 theorems, 28 equations, 7 figures)

This paper contains 24 sections, 5 theorems, 28 equations, 7 figures.

Key Result

Lemma 1

For an XED $i$ with computational capacity $C_i(t) \sim \text{U}(C_i^{\min}, C_i^{\max})$ and a streaming service with demand $\Delta_i(t) \sim \text{U}(\Delta_i^{\min}, \Delta_i^{\max})$, where $C_i(t)$ and $\Delta_i(t)$ are independent, if the condition $C_i^{\min} \leq \Theta \delta$ holds for th

Figures (7)

  • Figure 1: The XEC system model
  • Figure 2: Spatial partitioning for distributed object detection. (a) Full frame processed on a single device. (b) Same frame partitioned into four quadrants, each processed independently by a separate worker.
  • Figure 3: Validation of $R^{\text{MI}}_i(t, \Theta)$. (a) Analytical vs Monte Carlo vs simulation. (b) Effect of capacity. (c) Effect of demand.
  • Figure 4: Validation of $R^{\text{H}}_i(t, \Theta)$. (a) Analytical vs MC vs simulation. (b) MLE convergence with sample size. (c) Effect of capacity. (d) Effect of demand.
  • Figure 5: MLE parameter convergence. (a) Reliability estimate evolution. (b) Mean convergence. (c) Standard deviation convergence.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3: Optimal Work Partitioning
  • proof
  • Lemma 4: Series System Feasibility
  • proof
  • Lemma 5: Parallel System Device Selection
  • proof