Task Specific Sharpness Aware O-RAN Resource Management using Multi Agent Reinforcement Learning
Fatemeh Lotfi, Hossein Rajoli, Fatemeh Afghah
TL;DR
This work tackles robust, scalable resource management for dynamic O-RAN networks by integrating Sharpness-Aware Minimization into a distributed SAC-based MARL framework (TA-SAM MARL). A TD-error variance–driven mechanism selectively applies SAM to actors and critics to promote flatter, more generalizable loss landscapes while dynamically scheduling the SAM radius $ ho$ to balance exploration and exploitation across heterogeneous slices. Empirical results show up to 22% gains in resource allocation efficiency and QoS satisfaction across eMBB, mMTC, and URLLC slices, with improved stability and reduced forgetting in non-stationary network conditions. The proposed approach demonstrates strong generalization, scalability, and resilience, making it well-suited for deployment in near-real-time O-RAN control loops, with future work focusing on latency-aware inference and policy compression for hardware accelerators.
Abstract
Next-generation networks utilize the Open Radio Access Network (O-RAN) architecture to enable dynamic resource management, facilitated by the RAN Intelligent Controller (RIC). While deep reinforcement learning (DRL) models show promise in optimizing network resources, they often struggle with robustness and generalizability in dynamic environments. This paper introduces a novel resource management approach that enhances the Soft Actor Critic (SAC) algorithm with Sharpness-Aware Minimization (SAM) in a distributed Multi-Agent RL (MARL) framework. Our method introduces an adaptive and selective SAM mechanism, where regularization is explicitly driven by temporal-difference (TD)-error variance, ensuring that only agents facing high environmental complexity are regularized. This targeted strategy reduces unnecessary overhead, improves training stability, and enhances generalization without sacrificing learning efficiency. We further incorporate a dynamic $ρ$ scheduling scheme to refine the exploration-exploitation trade-off across agents. Experimental results show our method significantly outperforms conventional DRL approaches, yielding up to a $22\%$ improvement in resource allocation efficiency and ensuring superior QoS satisfaction across diverse O-RAN slices.
