SARA: A Stall-Aware Memory Allocation Strategy for Mixed-Criticality Systems
Meng-Chia Lee, Wen Sheng Lim, Yuan-Hao Chang, Tei-Wei Kuo
TL;DR
SARA addresses memory contention in memory-constrained edge devices hosting mixed-criticality workloads, where soft RT deadlines and non-RT throughput are both at risk due to page swapping and storage I/O. It introduces a stall-aware memory allocator that uses a PSI-based per-interval metric $s_{intv}$ to model how memory shortages affect soft RT execution times and derives an adaptive ideal stall per period $s^{ideal}_{period}$ and per-interval $s^{ideal}_{intv}$ to allocate memory just enough for deadlines while freeing remaining memory for non-RT tasks. The approach includes a stall-aware early job dropping mechanism to identify long stalls and proactively drop affected soft RT jobs, reducing cascading delays. Empirical evaluation on real hardware shows SARA achieves about $97.13\%$ deadline hit ratio for soft RT tasks and up to $22.32\times$ throughput gains for non-RT applications under memory capacity as low as $60\%$ of peak demand, outperforming greedy, offline, and PSI-based baselines. The work demonstrates practical, dynamic memory management for edge devices with mixed-criticality workloads, improving responsiveness and throughput without OS-level policy changes.
Abstract
The memory capacity in edge devices is often limited due to constraints on cost, size, and power. Consequently, memory competition leads to inevitable page swapping in memory-constrained mixed-criticality edge devices, causing slow storage I/O and thus performance degradation. In such scenarios, inefficient memory allocation disrupts the balance between application performance, causing soft real-time (soft RT) tasks to miss deadlines or preventing non-real-time (non-RT) applications from optimizing throughput. Meanwhile, we observe unpredictable, long system-level stalls (called long stalls) under high memory and I/O pressure, which further degrade performance. In this work, we propose a Stall-Aware Real-Time Memory Allocator (SARA), which discovers opportunities for performance balance by allocating just enough memory to soft RT tasks to meet deadlines and, at the same time, optimizing the remaining memory for non-RT applications. To minimize the memory usage of soft RT tasks while meeting real-time requirements, SARA leverages our insight into how latency, caused by memory insufficiency and measured by our proposed PSI-based metric, affects the execution time of each soft RT job, where a job runs per period and a soft RT task consists of multiple periods. Moreover, SARA detects long stalls using our definition and proactively drops affected jobs, minimizing stalls in task execution. Experiments show that SARA achieves an average of 97.13% deadline hit ratio for soft RT tasks and improves non-RT application throughput by up to 22.32x over existing approaches, even with memory capacity limited to 60% of peak demand.
