Causal Online Learning of Safe Regions in Cloud Radio Access Networks
Kim Hammar, Tansu Alpcan, Emil Lupu
TL;DR
Causal Online Learning (COL) addresses safe online control in cloud RANs by identifying a safe operating region for dynamic resource configurations. It fuses causal inference to bootstrap an initial safe region with Gaussian-process-based Bayesian learning to progressively expand it through interventions, guided by an information-gain–per–cost acquisition rule. COL provides probabilistic safety guarantees during learning and convergence to the full safe region under standard assumptions, demonstrating up to 10x improvements in sample efficiency over non-causal baselines on a 5G testbed. The approach enables safe autonomous management of RANs and offers a path to extending safe online learning to other networked systems and digital twins.
Abstract
Cloud radio access networks (RANs) enable cost-effective management of mobile networks by dynamically scaling their capacity on demand. However, deploying adaptive controllers to implement such dynamic scaling in operational networks is challenging due to the risk of breaching service agreements and operational constraints. To mitigate this challenge, we present a novel method for learning the safe operating region of the RAN, i.e., the set of resource allocations and network configurations for which its specification is fulfilled. The method, which we call (C)ausal (O)nline (L)earning, operates in two online phases: an inference phase and an intervention phase. In the first phase, we passively observe the RAN to infer an initial safe region via causal inference and Gaussian process regression. In the second phase, we gradually expand this region through interventional Bayesian learning. We prove that COL ensures that the learned region is safe with a specified probability and that it converges to the full safe region under standard conditions. We experimentally validate COL on a 5G testbed. The results show that COL quickly learns the safe region while incurring low operational cost and being up to 10x more sample-efficient than current state-of-the-art methods for safe learning.
