ScalO-RAN: Energy-aware Network Intelligence Scaling in Open RAN
Stefano Maxenti, Salvatore D'Oro, Leonardo Bonati, Michele Polese, Antonio Capone, Tommaso Melodia
TL;DR
ScalO-RAN tackles the problem of supplying latency-guaranteed, energy-efficient scaling for AI-powered O-RAN apps in a multi-tenant Open RAN. It combines a data-driven two-segment piecewise linear latency model with a mixed-integer optimization (ISP) that jointly decides which servers to activate and how to place xApps, rApps, and dApps to satisfy per-app latency and demand while maximizing profit and minimizing energy. The approach is validated via a concrete OpenShift prototype and numerical experiments, demonstrating that latency constraints drive scaling decisions and that ScalO-RAN achieves lower energy consumption and better latency compliance than traditional load-balancing approaches. This work advances practical, energy-conscious orchestration for cloudified RAN control planes, with clear implications for scalable 5G+/6G deployments under multi-tenant, multi-vendor environments.
Abstract
Network virtualization, software-defined infrastructure, and orchestration are pivotal elements in contemporary networks, yielding new vectors for optimization and novel capabilities. In line with these principles, O-RAN presents an avenue to bypass vendor lock-in, circumvent vertical configurations, enable network programmability, and facilitate integrated Artificial Intelligence (AI) support. Moreover, modern container orchestration frameworks (e.g., Kubernetes, Red Hat OpenShift) simplify the way cellular base stations, as well as the newly introduced RAN Intelligent Controllers (RICs), are deployed, managed, and orchestrated. While this enables cost reduction via infrastructure sharing, it also makes it more challenging to meet O-RAN control latency requirements, especially during peak resource utilization. To address this problem, we propose ScalO-RAN, a control framework rooted in optimization and designed as an O-RAN rApp that allocates and scales AI-based O-RAN applications (xApps, rApps, dApps) to: (i) abide by application-specific latency requirements, and (ii) monetize the shared infrastructure while reducing energy consumption. We prototype ScalO-RAN on an OpenShift cluster with base stations, RIC, and a set of AI-based xApps deployed as micro-services. We evaluate ScalO-RAN both numerically and experimentally. Our results show that ScalO-RAN can optimally allocate and distribute O-RAN applications within available computing nodes to accommodate even stringent latency requirements. More importantly, we show that scaling O-RAN applications is primarily a time-constrained problem rather than a resource-constrained one, where scaling policies must account for stringent inference time of AI applications, and not only how many resources they consume.
