Meta Reinforcement Learning Approach for Adaptive Resource Optimization in O-RAN
Fatemeh Lotfi, Fatemeh Afghah
TL;DR
This paper tackles adaptive resource optimization in O-RAN by proposing a Meta-DRL framework inspired by Model-Agnostic Meta-Learning (MAML) to jointly allocate resource blocks and downlink power. It models the problem as an MDP where distributed DRL agents at virtual DUs use continuous-action DDPG, coordinated by a meta-controller in the RIC to enable rapid adaptation to changing network conditions. A novel reward structure balances QoS, power consumption, and RB usage, and meta-training across multiple tasks promotes fast adaptation to new environments. The approach yields a reported $19.8\%$ improvement in final network management performance over baselines, demonstrating improved generalization and responsiveness for next-generation wireless networks in highly dynamic O-RAN settings.
Abstract
As wireless networks grow to support more complex applications, the Open Radio Access Network (O-RAN) architecture, with its smart RAN Intelligent Controller (RIC) modules, becomes a crucial solution for real-time network data collection, analysis, and dynamic management of network resources including radio resource blocks and downlink power allocation. Utilizing artificial intelligence (AI) and machine learning (ML), O-RAN addresses the variable demands of modern networks with unprecedented efficiency and adaptability. Despite progress in using ML-based strategies for network optimization, challenges remain, particularly in the dynamic allocation of resources in unpredictable environments. This paper proposes a novel Meta Deep Reinforcement Learning (Meta-DRL) strategy, inspired by Model-Agnostic Meta-Learning (MAML), to advance resource block and downlink power allocation in O-RAN. Our approach leverages O-RAN's disaggregated architecture with virtual distributed units (DUs) and meta-DRL strategies, enabling adaptive and localized decision-making that significantly enhances network efficiency. By integrating meta-learning, our system quickly adapts to new network conditions, optimizing resource allocation in real-time. This results in a 19.8% improvement in network management performance over traditional methods, advancing the capabilities of next-generation wireless networks.
