Multi-Dimensional Autoscaling of Stream Processing Services on Edge Devices
Boris Sedlak, Philipp Raith, Andrea Morichetta, Víctor Casamayor Pujol, Schahram Dustdar
TL;DR
This paper tackles autoscaling for resource-constrained Edge devices by introducing MUDAP, a platform that enables fine-grained vertical scaling across both service- and resource-level parameters. It pairs MUDAP with RASK, a regression-based scaling agent that builds explainable models of the processing environment and uses a numerical solver to compute optimal elasticity assignments, aiming to maximize SLO fulfillment across multiple competing services. The evaluation shows that RASK learns accurate models in about 20 training iterations (roughly 200s of processing) and sustains high load with up to 28% fewer SLO violations than Kubernetes VPA and RL baselines, while incurring minimal CPU overhead. The work demonstrates that increasing elasticity dimensions yields better SLO fulfillment and establishes a practical, modular approach to service-specific vertical scaling on edge devices, with implications for responsive, offload-free edge intelligence.
Abstract
Edge devices have limited resources, which inevitably leads to situations where stream processing services cannot satisfy their needs. While existing autoscaling mechanisms focus entirely on resource scaling, Edge devices require alternative ways to sustain the Service Level Objectives (SLOs) of competing services. To address these issues, we introduce a Multi-dimensional Autoscaling Platform (MUDAP) that supports fine-grained vertical scaling across both service- and resource-level dimensions. MUDAP supports service-specific scaling tailored to available parameters, e.g., scale data quality or model size for a particular service. To optimize the execution across services, we present a scaling agent based on Regression Analysis of Structural Knowledge (RASK). The RASK agent efficiently explores the solution space and learns a continuous regression model of the processing environment for inferring optimal scaling actions. We compared our approach with two autoscalers, the Kubernetes VPA and a reinforcement learning agent, for scaling up to 9 services on a single Edge device. Our results showed that RASK can infer an accurate regression model in merely 20 iterations (i.e., observe 200s of processing). By increasingly adding elasticity dimensions, RASK sustained the highest request load with 28% less SLO violations, compared to baselines.
