Table of Contents
Fetching ...

Hybrid Learning for Cold-Start-Aware Microservice Scheduling in Dynamic Edge Environments

Jingxi Lu, Wenhao Li, Jianxiong Guo, Xingjian Ding, Zhiqing Tang, Tian Wang, Weijia Jia

TL;DR

The paper tackles microservice scheduling on dynamic edge resources, addressing cold-start in online reinforcement learning. It introduces a two-phase framework that first learns from offline expert demonstrations via imitation learning and then fine-tunes online with a GRU-enhanced Soft Actor-Critic policy. A novel policy network decouples slow-changing node state and fast-changing microservice state, and an action-mask enforces feasibility across edge nodes. Empirical results show significant improvements in convergence speed and final objective (latency-energy trade-offs) compared with baselines, demonstrating robustness across varied edge configurations. The work advances practical, cold-start-aware scheduling for edge computing with containerized microservices.

Abstract

With the rapid growth of IoT devices and their diverse workloads, container-based microservices deployed at edge nodes have become a lightweight and scalable solution. However, existing microservice scheduling algorithms often assume static resource availability, which is unrealistic when multiple containers are assigned to an edge node. Besides, containers suffer from cold-start inefficiencies during early-stage training in currently popular reinforcement learning (RL) algorithms. In this paper, we propose a hybrid learning framework that combines offline imitation learning (IL) with online Soft Actor-Critic (SAC) optimization to enable a cold-start-aware microservice scheduling with dynamic allocation for computing resources. We first formulate a delay-and-energy-aware scheduling problem and construct a rule-based expert to generate demonstration data for behavior cloning. Then, a GRU-enhanced policy network is designed in the policy network to extract the correlation among multiple decisions by separately encoding slow-evolving node states and fast-changing microservice features, and an action selection mechanism is given to speed up the convergence. Extensive experiments show that our method significantly accelerates convergence and achieves superior final performance. Compared with baselines, our algorithm improves the total objective by $50\%$ and convergence speed by $70\%$, and demonstrates the highest stability and robustness across various edge configurations.

Hybrid Learning for Cold-Start-Aware Microservice Scheduling in Dynamic Edge Environments

TL;DR

The paper tackles microservice scheduling on dynamic edge resources, addressing cold-start in online reinforcement learning. It introduces a two-phase framework that first learns from offline expert demonstrations via imitation learning and then fine-tunes online with a GRU-enhanced Soft Actor-Critic policy. A novel policy network decouples slow-changing node state and fast-changing microservice state, and an action-mask enforces feasibility across edge nodes. Empirical results show significant improvements in convergence speed and final objective (latency-energy trade-offs) compared with baselines, demonstrating robustness across varied edge configurations. The work advances practical, cold-start-aware scheduling for edge computing with containerized microservices.

Abstract

With the rapid growth of IoT devices and their diverse workloads, container-based microservices deployed at edge nodes have become a lightweight and scalable solution. However, existing microservice scheduling algorithms often assume static resource availability, which is unrealistic when multiple containers are assigned to an edge node. Besides, containers suffer from cold-start inefficiencies during early-stage training in currently popular reinforcement learning (RL) algorithms. In this paper, we propose a hybrid learning framework that combines offline imitation learning (IL) with online Soft Actor-Critic (SAC) optimization to enable a cold-start-aware microservice scheduling with dynamic allocation for computing resources. We first formulate a delay-and-energy-aware scheduling problem and construct a rule-based expert to generate demonstration data for behavior cloning. Then, a GRU-enhanced policy network is designed in the policy network to extract the correlation among multiple decisions by separately encoding slow-evolving node states and fast-changing microservice features, and an action selection mechanism is given to speed up the convergence. Extensive experiments show that our method significantly accelerates convergence and achieves superior final performance. Compared with baselines, our algorithm improves the total objective by and convergence speed by , and demonstrates the highest stability and robustness across various edge configurations.

Paper Structure

This paper contains 31 sections, 23 equations, 11 figures, 1 table, 4 algorithms.

Figures (11)

  • Figure 1: The microservice management architecture: Demonstrates our microservice deployment strategy using containers for optimized service delay and energy usage across edge nodes, maintaining acceptable performance under variable loads.
  • Figure 2: Overview of the system model, in each time slot $t$, there will be $K_t$ microservices that need to be offloaded to acceptable edge nodes at the same time. However, in the actual scheduling, it will be processed in a sequential manner, which will be described in Sec. \ref{['sec4-2']}.
  • Figure 3: Overview of the SAC-based framework.
  • Figure 4: Convergence performance of different scheduling algorithms over 350 episodes. The plot shows smoothed reward trajectories under identical environment settings. Compared models include our GRU-based Hybrid_SAC and its behavior cloning variant (BC_Hybrid_SAC), GRU-based PPO (Hybrid_PPO), standard FC-based SAC and PPO, and their BC variants. GRU_SAC removes the hybrid structure to assess its impact. DQN and the rule-based Greedy strategy serve as value-based and heuristic baselines. BC_Hybrid_SAC achieves the highest reward and fastest convergence.
  • Figure 5: Comparison of convergence performance across different node settings with and without Behavior Cloning. Each subfigure shows the smoothed reward trajectories of Hybrid_SAC and SAC models, both with and without BC initialization.
  • ...and 6 more figures