Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation
Cengis Hasan, Alexandros Agapitos, David Lynch, Alberto Castagna, Giorgio Cruciata, Hao Wang, Aleksandar Milenovic
TL;DR
The paper tackles the long lead-time of deploying cell-level parameter optimisations in new wireless-network sites by formulating stage-wise parameter subset tuning as continual model-based reinforcement learning over expanding action spaces. It combines a probabilistic reward ensemble, autoencoder-based state compression, and the Progress-&-Compress framework to achieve data-efficient transfer across tasks while avoiding catastrophic forgetting. Empirical results show a two-fold reduction in deployment lead-time, up to a 4% throughput gain, and substantial reductions in data needs, memory, and training time, with 80 ms inference for large-scale networks. The work demonstrates practical gains in adapting to diverse sites and traffic conditions, and outlines future directions in causal structure learning and sim-to-real policy warm-starting to further improve efficiency and robustness.
Abstract
We present a method that addresses the pain point of long lead-time required to deploy cell-level parameter optimisation policies to new wireless network sites. Given a sequence of action spaces represented by overlapping subsets of cell-level configuration parameters provided by domain experts, we formulate throughput optimisation as Continual Reinforcement Learning of control policies. Simulation results suggest that the proposed system is able to shorten the end-to-end deployment lead-time by two-fold compared to a reinitialise-and-retrain baseline without any drop in optimisation gain.
