Table of Contents
Fetching ...

Anomaly Detection for Scalable Task Grouping in Reinforcement Learning-based RAN Optimization

Jimmy Li, Igor Kozlov, Di Wu, Xue Liu, Gregory Dudek

TL;DR

The paper tackles scaling reinforcement learning for cellular RAN optimization across thousands of sites by introducing a scalable policy bank that reuses policies when compatible and trains new ones only for incompatible tasks. Compatibility is determined through anomaly-detection-based methods (BG and TREX-DINO), framed as change-point detection on time-series interaction data, with distillation used to cap the bank size. Empirical results on a system-level simulator show that AD-based methods deliver high performance-to-training ratios $\xi$ while achieving comparable cumulative rewards $\rho$ to baselines that train policies for every task, with TD offering robustness to threshold settings. The framework enables practical deployment of multitask RL controllers in real-world RANs and is extendable to other RAN optimization problems like energy saving.

Abstract

The use of learning-based methods for optimizing cellular radio access networks (RAN) has received increasing attention in recent years. This coincides with a rapid increase in the number of cell sites worldwide, driven largely by dramatic growth in cellular network traffic. Training and maintaining learned models that work well across a large number of cell sites has thus become a pertinent problem. This paper proposes a scalable framework for constructing a reinforcement learning policy bank that can perform RAN optimization across a large number of cell sites with varying traffic patterns. Central to our framework is a novel application of anomaly detection techniques to assess the compatibility between sites (tasks) and the policy bank. This allows our framework to intelligently identify when a policy can be reused for a task, and when a new policy needs to be trained and added to the policy bank. Our results show that our approach to compatibility assessment leads to an efficient use of computational resources, by allowing us to construct a performant policy bank without exhaustively training on all tasks, which makes it applicable under real-world constraints.

Anomaly Detection for Scalable Task Grouping in Reinforcement Learning-based RAN Optimization

TL;DR

The paper tackles scaling reinforcement learning for cellular RAN optimization across thousands of sites by introducing a scalable policy bank that reuses policies when compatible and trains new ones only for incompatible tasks. Compatibility is determined through anomaly-detection-based methods (BG and TREX-DINO), framed as change-point detection on time-series interaction data, with distillation used to cap the bank size. Empirical results on a system-level simulator show that AD-based methods deliver high performance-to-training ratios while achieving comparable cumulative rewards to baselines that train policies for every task, with TD offering robustness to threshold settings. The framework enables practical deployment of multitask RL controllers in real-world RANs and is extendable to other RAN optimization problems like energy saving.

Abstract

The use of learning-based methods for optimizing cellular radio access networks (RAN) has received increasing attention in recent years. This coincides with a rapid increase in the number of cell sites worldwide, driven largely by dramatic growth in cellular network traffic. Training and maintaining learned models that work well across a large number of cell sites has thus become a pertinent problem. This paper proposes a scalable framework for constructing a reinforcement learning policy bank that can perform RAN optimization across a large number of cell sites with varying traffic patterns. Central to our framework is a novel application of anomaly detection techniques to assess the compatibility between sites (tasks) and the policy bank. This allows our framework to intelligently identify when a policy can be reused for a task, and when a new policy needs to be trained and added to the policy bank. Our results show that our approach to compatibility assessment leads to an efficient use of computational resources, by allowing us to construct a performant policy bank without exhaustively training on all tasks, which makes it applicable under real-world constraints.
Paper Structure (16 sections, 3 figures, 1 algorithm)

This paper contains 16 sections, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Performance $\rho$ and performance to training ratio $\xi$ for all methods across different hyperparameter settings. Error bars denote standard deviation across 5 different random seeds. Our AD-based approach (BG, TD) is capable of achieving much higher $\xi$ while closely matching the performance of the non-AD-based KT, PR, and TS, which demonstrates the superior efficiency of our method.
  • Figure 2: Number of trained tasks and performance $\rho$ as a function of total tasks processed so far, across different compatibility thresholds. Ribbons denote standard deviation across 5 different random seeds. Our AD-based methods (BG, TD) can closely match the performance of KT and PR with less training.
  • Figure 3: Same format as Figure \ref{['fig:performance_train_time_thresholds']}, except we vary the policy bank size instead of the compatibility threshold. Performance $\rho$ generally decreases as more tasks are processed, and as bank size decreases, since this requires each policy to generalize to more tasks.