An Interference-aware Approach for Co-located Container Orchestration with Novel Metric
Xiang Li, Linfeng Wen, Minxian Xu, Kejiang Ye
TL;DR
The paper tackles the problem of interference when co-locating online real-time and offline batch workloads on shared container infrastructure. It introduces scheduling latency as a novel interference metric, builds a two-level predictive framework (per-node and per-pod) to quantify and forecast interference, and implements a latency-aware scheduling algorithm that balances online performance with resource utilization. Experimental results show substantial reductions in online response times (average, 90th, and 99th percentiles) and improved variance in CPU/memory usage across cluster nodes compared with baseline schedulers. This work enables more reliable and efficient co-location of mixed online/offline workloads in containerized environments by tying scheduling decisions to interference-aware predictions.
Abstract
Container orchestration technologies are widely employed in cloud computing, facilitating the co-location of online and offline services on the same infrastructure. Online services demand rapid responsiveness and high availability, whereas offline services require extensive computational resources. However, this mixed deployment can lead to resource contention, adversely affecting the performance of online services, yet the metrics used by existing methods cannot accurately reflect the extent of interference. In this paper, we introduce scheduling latency as a novel metric for quantifying interference and compare it with existing metrics. Empirical evidence demonstrates that scheduling latency more accurately reflects the performance degradation of online services. We also utilize various machine learning techniques to predict potential interference on specific hosts for online services, providing reference information for subsequent scheduling decisions. Simultaneously, we propose a method for quantifying node interference based on scheduling latency. To enhance resource utilization, we train a model for online services that predicts CPU and MEM (memory) resource allocation based on workload type and QPS. Finally, we present a scheduling algorithm based on predictive modeling, aiming to reduce interference in online services while balancing node resource utilization. Through experiments and comparisons with three other baseline methods, we demonstrate the effectiveness of our approach. Compared with three baselines, our approach can reduce the average response time, 90th percentile response time, and 99th percentile response time of online services by 29.4%, 31.4%, and 14.5%, respectively.
