Table of Contents
Fetching ...

Rivaling Transformers: Multi-Scale Structured State-Space Mixtures for Agentic 6G O-RAN

Farhad Rezazadeh, Hatim Chergui, Merouane Debbah, Houbing Song, Dusit Niyato, Lingjia Liu

TL;DR

This work tackles the challenge of prediction services for proactive, agentic control in 6G O-RAN under near-real-time latency constraints. It introduces MS$^{3}$M, a strictly causal forecaster that mixes HiPPO--LegS state-space kernels across multiple time scales via depthwise convolution, SE gating, and a compact GLU mixer to forecast next-step KPI values with high efficiency. The model delivers Transformer-competitive accuracy (RMSE ≈ 0.292 dB, MAE ≈ 0.170 dB, $R^2 ≈ 0.993$) while achieving substantially lower latency (≈0.057 s per inference) and a smaller footprint (≈0.70M parameters) on an O-RAN KPI dataset. The authors provide leakage-safe training and evaluation, a comprehensive complexity analysis, and an open-source implementation to facilitate deployment in Near-RT RIC xApps for anticipatory network control, highlighting MS$^{3}$M’s favorable accuracy–efficiency trade-off for real-time, edge-enabled KPI forecasting.

Abstract

In sixth-generation (6G) Open Radio Access Networks (O-RAN), proactive control is preferable. A key open challenge is delivering control-grade predictions within Near-Real-Time (Near-RT) latency and computational constraints under multi-timescale dynamics. We therefore cast RAN Intelligent Controller (RIC) analytics as an agentic perceive-predict xApp that turns noisy, multivariate RAN telemetry into short-horizon per-User Equipment (UE) key performance indicator (KPI) forecasts to drive anticipatory control. In this regard, Transformers are powerful for sequence learning and time-series forecasting, but they are memory-intensive, which limits Near-RT RIC use. Therefore, we need models that maintain accuracy while reducing latency and data movement. To this end, we propose a lightweight Multi-Scale Structured State-Space Mixtures (MS3M) forecaster that mixes HiPPO-LegS kernels to capture multi-timescale radio dynamics. We develop stable discrete state-space models (SSMs) via bilinear (Tustin) discretization and apply their causal impulse responses as per-feature depthwise convolutions. Squeeze-and-Excitation gating dynamically reweights KPI channels as conditions change, and a compact gated channel-mixing layer models cross-feature nonlinearities without Transformer-level cost. The model is KPI-agnostic -- Reference Signal Received Power (RSRP) serves as a canonical use case -- and is trained on sliding windows to predict the immediate next step. Empirical evaluations conducted using our bespoke O-RAN testbed KPI time-series dataset (59,441 windows across 13 KPIs). Crucially for O-RAN constraints, MS3M achieves a 0.057 s per-inference latency with 0.70M parameters, yielding 3-10x lower latency than the Transformer baselines evaluated on the same hardware, while maintaining competitive accuracy.

Rivaling Transformers: Multi-Scale Structured State-Space Mixtures for Agentic 6G O-RAN

TL;DR

This work tackles the challenge of prediction services for proactive, agentic control in 6G O-RAN under near-real-time latency constraints. It introduces MSM, a strictly causal forecaster that mixes HiPPO--LegS state-space kernels across multiple time scales via depthwise convolution, SE gating, and a compact GLU mixer to forecast next-step KPI values with high efficiency. The model delivers Transformer-competitive accuracy (RMSE ≈ 0.292 dB, MAE ≈ 0.170 dB, ) while achieving substantially lower latency (≈0.057 s per inference) and a smaller footprint (≈0.70M parameters) on an O-RAN KPI dataset. The authors provide leakage-safe training and evaluation, a comprehensive complexity analysis, and an open-source implementation to facilitate deployment in Near-RT RIC xApps for anticipatory network control, highlighting MSM’s favorable accuracy–efficiency trade-off for real-time, edge-enabled KPI forecasting.

Abstract

In sixth-generation (6G) Open Radio Access Networks (O-RAN), proactive control is preferable. A key open challenge is delivering control-grade predictions within Near-Real-Time (Near-RT) latency and computational constraints under multi-timescale dynamics. We therefore cast RAN Intelligent Controller (RIC) analytics as an agentic perceive-predict xApp that turns noisy, multivariate RAN telemetry into short-horizon per-User Equipment (UE) key performance indicator (KPI) forecasts to drive anticipatory control. In this regard, Transformers are powerful for sequence learning and time-series forecasting, but they are memory-intensive, which limits Near-RT RIC use. Therefore, we need models that maintain accuracy while reducing latency and data movement. To this end, we propose a lightweight Multi-Scale Structured State-Space Mixtures (MS3M) forecaster that mixes HiPPO-LegS kernels to capture multi-timescale radio dynamics. We develop stable discrete state-space models (SSMs) via bilinear (Tustin) discretization and apply their causal impulse responses as per-feature depthwise convolutions. Squeeze-and-Excitation gating dynamically reweights KPI channels as conditions change, and a compact gated channel-mixing layer models cross-feature nonlinearities without Transformer-level cost. The model is KPI-agnostic -- Reference Signal Received Power (RSRP) serves as a canonical use case -- and is trained on sliding windows to predict the immediate next step. Empirical evaluations conducted using our bespoke O-RAN testbed KPI time-series dataset (59,441 windows across 13 KPIs). Crucially for O-RAN constraints, MS3M achieves a 0.057 s per-inference latency with 0.70M parameters, yielding 3-10x lower latency than the Transformer baselines evaluated on the same hardware, while maintaining competitive accuracy.

Paper Structure

This paper contains 44 sections, 2 theorems, 26 equations, 2 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

If $A_{\mathrm{ct}}$ is Hurwitz (all eigenvalues in the open left half-plane), then for any $\Delta t>0$, $A(\Delta t)$ in eq:tustin-disc is Schur-stable: $\rho(A(\Delta t))<1$.

Figures (2)

  • Figure 1: Virginia Tech Innovation Campus O-RAN testbed setup dai2025orankpi.
  • Figure 2: Comprehensive test-set diagnostics for the MS3M RSRP forecaster. (a) Ground truth vs. prediction for the last 1000 samples: the prediction closely tracks the measured RSRP with minimal phase lag and small amplitude error, illustrating stable short-horizon behavior on recent data. (b) Parity plot ($\hat{y}$ vs. $y$): points cluster tightly around the identity line, consistent with low error (annotated RMSE and MAE) and high explained variance ($R^2\approx 0.993$). (c) Residual distribution: histogram is narrow and approximately Gaussian, centered near zero, indicating low bias and a concentrated error profile in dBm. (d) Residual Q--Q: empirical quantiles align well with the theoretical normal line; only mild tail deviations are visible, suggesting near-normal residuals. (e) Residuals vs. predicted: no strong trend or funnel shape, indicating no pronounced heteroscedasticity across the predicted range. (f)$|$Error$|$ CDF: the curve rises steeply, showing that a large fraction of samples have small absolute error (sub-dBm to low-dBm range), consistent with precise predictions. (g)$|$Error$|$ boxplot: a compact interquartile range and low median reaffirm that typical errors are small. (h) Residual autocorrelation: most lags lie within the (approx.) 95% Bartlett band, indicating little remaining temporal structure in residuals (i.e., limited leftover predictability). (i) Permutation feature importance ($\Delta$RMSE in dBm): increases in RMSE after feature-wise permutation quantify sensitivity. Radio-quality indicators such as RSRP, RSSI, SINR, PMI, and CQI emerge among the most influential, followed by RSRQ, spectral-efficiency/coding/throughput descriptors (e.g., SE, RI, MCS, BLER), and scheduler/traffic context (e.g., PRBs, delay, buffer occupancy). These attributions are measured directly on the test set in original dBm units.

Theorems & Definitions (3)

  • Proposition 1: Schur stability via bilinear transform
  • proof
  • Lemma 1: Exponential kernel decay