Table of Contents
Fetching ...

Towards Safe Load Balancing based on Control Barrier Functions and Deep Reinforcement Learning

Lam Dinh, Pham Tran Anh Quang, Jérémie Leguay

TL;DR

This work tackles the safety challenge of applying reinforcement learning to SD-WAN load balancing by combining a deep RL controller with a Control Barrier Function (CBF) to enforce a capacity-safety constraint $μ ≤ 1$ throughout training and deployment. A local-search–based CBF projects unsafe proto-actions to safe actions, enabling hard safety guarantees while preserving learning efficiency. Empirical results show that PPO with CBF (PPO-CBF) achieves near-optimal end-to-end tunnel delay while strictly avoiding capacity violations, outperforming DDPG-based safety approaches. GPU-accelerated training delivers practical update times (≈110× faster), making safe, online RL-based load balancing feasible in real networks.

Abstract

Deep Reinforcement Learning (DRL) algorithms have recently made significant strides in improving network performance. Nonetheless, their practical use is still limited in the absence of safe exploration and safe decision-making. In the context of commercial solutions, reliable and safe-to-operate systems are of paramount importance. Taking this problem into account, we propose a safe learning-based load balancing algorithm for Software Defined-Wide Area Network (SD-WAN), which is empowered by Deep Reinforcement Learning (DRL) combined with a Control Barrier Function (CBF). It safely projects unsafe actions into feasible ones during both training and testing, and it guides learning towards safe policies. We successfully implemented the solution on GPU to accelerate training by approximately 110x times and achieve model updates for on-policy methods within a few seconds, making the solution practical. We show that our approach delivers near-optimal Quality-of-Service (QoS performance in terms of end-to-end delay while respecting safety requirements related to link capacity constraints. We also demonstrated that on-policy learning based on Proximal Policy Optimization (PPO) performs better than off-policy learning with Deep Deterministic Policy Gradient (DDPG) when both are combined with a CBF for safe load balancing.

Towards Safe Load Balancing based on Control Barrier Functions and Deep Reinforcement Learning

TL;DR

This work tackles the safety challenge of applying reinforcement learning to SD-WAN load balancing by combining a deep RL controller with a Control Barrier Function (CBF) to enforce a capacity-safety constraint throughout training and deployment. A local-search–based CBF projects unsafe proto-actions to safe actions, enabling hard safety guarantees while preserving learning efficiency. Empirical results show that PPO with CBF (PPO-CBF) achieves near-optimal end-to-end tunnel delay while strictly avoiding capacity violations, outperforming DDPG-based safety approaches. GPU-accelerated training delivers practical update times (≈110× faster), making safe, online RL-based load balancing feasible in real networks.

Abstract

Deep Reinforcement Learning (DRL) algorithms have recently made significant strides in improving network performance. Nonetheless, their practical use is still limited in the absence of safe exploration and safe decision-making. In the context of commercial solutions, reliable and safe-to-operate systems are of paramount importance. Taking this problem into account, we propose a safe learning-based load balancing algorithm for Software Defined-Wide Area Network (SD-WAN), which is empowered by Deep Reinforcement Learning (DRL) combined with a Control Barrier Function (CBF). It safely projects unsafe actions into feasible ones during both training and testing, and it guides learning towards safe policies. We successfully implemented the solution on GPU to accelerate training by approximately 110x times and achieve model updates for on-policy methods within a few seconds, making the solution practical. We show that our approach delivers near-optimal Quality-of-Service (QoS performance in terms of end-to-end delay while respecting safety requirements related to link capacity constraints. We also demonstrated that on-policy learning based on Proximal Policy Optimization (PPO) performs better than off-policy learning with Deep Deterministic Policy Gradient (DDPG) when both are combined with a CBF for safe load balancing.
Paper Structure (11 sections, 12 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 11 sections, 12 equations, 8 figures, 1 table, 2 algorithms.

Figures (8)

  • Figure 1: SD-WAN network with an headquarter and 3 branches.
  • Figure 2: Safety-based actor-critic learning architecture.
  • Figure 3: From proto-policy to safe policy with CBF.
  • Figure 4: Example of tunnels' traffic over a window of 1000s.
  • Figure 5: System architecture with 1) network environment running on CPU and 2) safe RL algorithms on GPU.
  • ...and 3 more figures