PPO-EPO: Energy and Performance Optimization for O-RAN Using Reinforcement Learning
Rawlings Ntassah, Gian Michele Dell'Aera, Fabrizio Granelli
TL;DR
Open RAN energy efficiency is addressed with a PPO-based reinforcement learning framework for traffic steering and selective cell shutdown, incorporating throughput, interference, and PRB constraints. The objective combines throughput and energy gains via $ \max_{x_k} \sum_{k=1}^K x_k (\omega_{\text{perf}} G_{\text{perf,k}} + \omega_{\text{power}} P_{\text{gain,k}})$ with $EE_{\text{total}} = \frac{R_{\text{avrg}}}{P_{\text{avrg}}}$, and constraints ensure post-shutdown performance and resource limits. Training on a Turin-inspired CU–DU–RU topology with 12 RUs and 5 million PPO episodes, followed by validation on real data from the Viavi RIC Tester, demonstrates that PPO outperforms SARSA and Random in delivering higher downlink throughput and reduced energy consumption while honoring interference thresholds. The work underscores the practicality of RL for energy-aware O-RAN management and points to future avenues like Hybrid and Hierarchical RL for more robust, predictive traffic control.
Abstract
Energy consumption in mobile communication networks has become a significant challenge due to its direct impact on Capital Expenditure (CAPEX) and Operational Expenditure (OPEX). The introduction of Open RAN (O-RAN) enables telecommunication providers to leverage network intelligence to optimize energy efficiency while maintaining Quality of Service (QoS). One promising approach involves traffic-aware cell shutdown strategies, where underutilized cells are selectively deactivated without compromising overall network performance. However, achieving this balance requires precise traffic steering mechanisms that account for throughput performance, power efficiency, and network interference constraints. This work proposes a reinforcement learning (RL) model based on the Proximal Policy Optimization (PPO) algorithm to optimize traffic steering and energy efficiency. The objective is to maximize energy efficiency and performance gains while strategically shutting down underutilized cells. The proposed RL model learns adaptive policies to make optimal shutdown decisions by considering throughput degradation constraints, interference thresholds, and PRB utilization balance. Experimental validation using TeraVM Viavi RIC tester data demonstrates that our method significantly improves the network's energy efficiency and downlink throughput.
