Hybrid Learning for Cold-Start-Aware Microservice Scheduling in Dynamic Edge Environments
Jingxi Lu, Wenhao Li, Jianxiong Guo, Xingjian Ding, Zhiqing Tang, Tian Wang, Weijia Jia
TL;DR
The paper tackles microservice scheduling on dynamic edge resources, addressing cold-start in online reinforcement learning. It introduces a two-phase framework that first learns from offline expert demonstrations via imitation learning and then fine-tunes online with a GRU-enhanced Soft Actor-Critic policy. A novel policy network decouples slow-changing node state and fast-changing microservice state, and an action-mask enforces feasibility across edge nodes. Empirical results show significant improvements in convergence speed and final objective (latency-energy trade-offs) compared with baselines, demonstrating robustness across varied edge configurations. The work advances practical, cold-start-aware scheduling for edge computing with containerized microservices.
Abstract
With the rapid growth of IoT devices and their diverse workloads, container-based microservices deployed at edge nodes have become a lightweight and scalable solution. However, existing microservice scheduling algorithms often assume static resource availability, which is unrealistic when multiple containers are assigned to an edge node. Besides, containers suffer from cold-start inefficiencies during early-stage training in currently popular reinforcement learning (RL) algorithms. In this paper, we propose a hybrid learning framework that combines offline imitation learning (IL) with online Soft Actor-Critic (SAC) optimization to enable a cold-start-aware microservice scheduling with dynamic allocation for computing resources. We first formulate a delay-and-energy-aware scheduling problem and construct a rule-based expert to generate demonstration data for behavior cloning. Then, a GRU-enhanced policy network is designed in the policy network to extract the correlation among multiple decisions by separately encoding slow-evolving node states and fast-changing microservice features, and an action selection mechanism is given to speed up the convergence. Extensive experiments show that our method significantly accelerates convergence and achieves superior final performance. Compared with baselines, our algorithm improves the total objective by $50\%$ and convergence speed by $70\%$, and demonstrates the highest stability and robustness across various edge configurations.
