Table of Contents
Fetching ...

Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online Learning

Michail Kalntis, George Iosifidis, Fernando A. Kuipers

TL;DR

This paper tackles robust resource thresholding for virtualized base stations in O-RAN under non-stationary conditions by casting non-RT policy configuration as an online learning problem. It introduces BSvBS, an adversarial bandit algorithm with sub-linear regret, and MetBS, a meta-learning framework that dynamically selects the best-performing algorithm for a given environment. The authors provide formal regret guarantees, analyze computational overhead, and validate the approach on real-world traces, achieving up to 64.5% energy savings and sub-linear regret across static, stationary, and adversarial scenarios. The work enables practical, low-overhead, robust vBS management in multi-tier O-RAN architectures and suggests a roadmap for integrating such learning-driven policies into live RIC ecosystems.

Abstract

Open Radio Access Network systems, with their virtualized base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability. Optimizing the allocation of resources in a vBS is challenging since it requires knowledge of the environment, (i.e., "external'' information), such as traffic demands and channel quality, which is difficult to acquire precisely over short intervals of a few seconds. To tackle this problem, we propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments; for instance, non-stationary or adversarial traffic demands. We also develop a meta-learning scheme, which leverages the power of other algorithmic approaches, tailored for more "easy'' environments, and dynamically chooses the best performing one, thus enhancing the overall system's versatility and effectiveness. We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments. The performance of the algorithms is evaluated with real-world data and various trace-driven evaluations, indicating savings of up to 64.5% in the power consumption of a vBS compared with state-of-the-art benchmarks.

Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online Learning

TL;DR

This paper tackles robust resource thresholding for virtualized base stations in O-RAN under non-stationary conditions by casting non-RT policy configuration as an online learning problem. It introduces BSvBS, an adversarial bandit algorithm with sub-linear regret, and MetBS, a meta-learning framework that dynamically selects the best-performing algorithm for a given environment. The authors provide formal regret guarantees, analyze computational overhead, and validate the approach on real-world traces, achieving up to 64.5% energy savings and sub-linear regret across static, stationary, and adversarial scenarios. The work enables practical, low-overhead, robust vBS management in multi-tier O-RAN architectures and suggests a roadmap for integrating such learning-driven policies into live RIC ecosystems.

Abstract

Open Radio Access Network systems, with their virtualized base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability. Optimizing the allocation of resources in a vBS is challenging since it requires knowledge of the environment, (i.e., "external'' information), such as traffic demands and channel quality, which is difficult to acquire precisely over short intervals of a few seconds. To tackle this problem, we propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments; for instance, non-stationary or adversarial traffic demands. We also develop a meta-learning scheme, which leverages the power of other algorithmic approaches, tailored for more "easy'' environments, and dynamically chooses the best performing one, thus enhancing the overall system's versatility and effectiveness. We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments. The performance of the algorithms is evaluated with real-world data and various trace-driven evaluations, indicating savings of up to 64.5% in the power consumption of a vBS compared with state-of-the-art benchmarks.
Paper Structure (26 sections, 2 theorems, 26 equations, 9 figures, 2 algorithms)

This paper contains 26 sections, 2 theorems, 26 equations, 9 figures, 2 algorithms.

Key Result

Lemma 1

Let $T > 0$ be a fixed time horizon. Set input parameter $\gamma = \min\left\{1, \sqrt{{|\mathcal{X}| \ln{|\mathcal{X}|}}/{((e-1)T)}}\right\}$. Then, running Algorithm 1 ensures that the expected regret is:

Figures (9)

  • Figure 1: O-RAN-compliant architecture & policy workflow. (a). The proposed policy operates in the non-RT RIC and decides MCS, Power and PRB thresholds that are sent to each vBS's scheduler.(b) The key building block is the Non-RT RIC, hosted by the Service Management and Orchestration (SMO) framework, and the Near-RT RIC. The system has three control loops: (i) Non-RT, which involves large-timescale operations with execution time ≥ 1s, (ii) Near-RT (>10ms), and (iii) RT (≤ 10ms). (b) Policy Flow for the Non-RT RIC with (bottom) and without (top) an rApp implementing a meta-learner.
  • Figure 2: (a)$R_T$ achieved from BSvBS in Scenario A (static) and its upper bound; (b) heatmap for the choices of BSvBS in Scenario A, showing the probability that each policy is chosen at $t=50k$.
  • Figure 3: Scenario A (static) for BSvBS: (a) MCS in DL (left) / UL (right); (b) PRB ratio in DL (left) / UL (right); (c) power (left) and utility (right) w.r.t. $\delta$, with 0.95-CI. In each plot, the blue and green lines correspond to the left and right y-axis, respectively.
  • Figure 4: $R_T/T$ for BSvBS in Scenario B (stationary), together with Random, a naive algorithm that selects policies randomly.
  • Figure 5: BP-vRAN executed for $T=1000$ rounds in dynamic Scenario C, in a subset of the policy space: (a)$R_T/T$; (b) number of times each policy is chosen.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Lemma 2
  • proof