Table of Contents
Fetching ...

MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris

TL;DR

MultiTASC++ addresses the challenge of scalable, high-accuracy DNN inference in indoor edge environments where many devices share a server. It introduces a continuously adaptive, multi-tenancy-aware scheduler that dynamically tunes per-device forwarding thresholds, updates $SLO$ targets, scales thresholds, and can switch server models to balance latency and accuracy. Key contributions include a formal system model for multi-device cascades, a continuous, per-device threshold reconfiguration mechanism, $SLO$ satisfaction rate updates, threshold scaling with device-load-aware multipliers, and a server-model-switching capability. Empirical results demonstrate that MultiTASC++ consistently maintains target $SLO$ satisfaction, achieves higher accuracy than baselines, and scales throughput as the number of devices grows, including under intermittent participation and with transformer models.

Abstract

Cascade systems, consisting of a lightweight model processing all samples and a heavier, high-accuracy model refining challenging samples, have become a widely-adopted distributed inference approach to achieving high accuracy and maintaining a low computational burden for mobile and IoT devices. As intelligent indoor environments, like smart homes, continue to expand, a new scenario emerges, the multi-device cascade. In this setting, multiple diverse devices simultaneously utilize a shared heavy model hosted on a server, often situated within or close to the consumer environment. This work introduces MultiTASC++, a continuously adaptive multi-tenancy-aware scheduler that dynamically controls the forwarding decision functions of devices to optimize system throughput while maintaining high accuracy and low latency. Through extensive experimentation in diverse device environments and with varying server-side models, we demonstrate the scheduler's efficacy in consistently maintaining a targeted satisfaction rate while providing the highest available accuracy across different device tiers and workloads of up to 100 devices. This demonstrates its scalability and efficiency in addressing the unique challenges of collaborative DNN inference in dynamic and diverse IoT environments.

MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

TL;DR

MultiTASC++ addresses the challenge of scalable, high-accuracy DNN inference in indoor edge environments where many devices share a server. It introduces a continuously adaptive, multi-tenancy-aware scheduler that dynamically tunes per-device forwarding thresholds, updates targets, scales thresholds, and can switch server models to balance latency and accuracy. Key contributions include a formal system model for multi-device cascades, a continuous, per-device threshold reconfiguration mechanism, satisfaction rate updates, threshold scaling with device-load-aware multipliers, and a server-model-switching capability. Empirical results demonstrate that MultiTASC++ consistently maintains target satisfaction, achieves higher accuracy than baselines, and scales throughput as the number of devices grows, including under intermittent participation and with transformer models.

Abstract

Cascade systems, consisting of a lightweight model processing all samples and a heavier, high-accuracy model refining challenging samples, have become a widely-adopted distributed inference approach to achieving high accuracy and maintaining a low computational burden for mobile and IoT devices. As intelligent indoor environments, like smart homes, continue to expand, a new scenario emerges, the multi-device cascade. In this setting, multiple diverse devices simultaneously utilize a shared heavy model hosted on a server, often situated within or close to the consumer environment. This work introduces MultiTASC++, a continuously adaptive multi-tenancy-aware scheduler that dynamically controls the forwarding decision functions of devices to optimize system throughput while maintaining high accuracy and low latency. Through extensive experimentation in diverse device environments and with varying server-side models, we demonstrate the scheduler's efficacy in consistently maintaining a targeted satisfaction rate while providing the highest available accuracy across different device tiers and workloads of up to 100 devices. This demonstrates its scalability and efficiency in addressing the unique challenges of collaborative DNN inference in dynamic and diverse IoT environments.

Paper Structure

This paper contains 20 sections, 7 equations, 20 figures, 1 table, 1 algorithm.

Figures (20)

  • Figure 1: Example of an AI-driven smart office.
  • Figure 2: System architecture of a multi-device cascade multitasc2023iscc.
  • Figure 3: Architecture of the MultiTASC++ scheduler.
  • Figure 4: SLO satisfaction rate for InceptionV3 - MobileNetV2.
  • Figure 5: Accuracy for InceptionV3 - MobileNetV2.
  • ...and 15 more figures