Table of Contents
Fetching ...

Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning

Farhad Rezazadeh, Hatim Chergui, Merouane Debbah, Houbing Song, Dusit Niyato, Lingjia Liu

TL;DR

This work reframe open radio access network (O-RAN) near-real-time (Near-RT) control via counterfactual dynamics and a world modeling (WM) paradigm that learns an action-conditioned generative state space to enable quantitative"what-if"forecasting beyond large language models (LLMs) as the primary modeling primitive.

Abstract

We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncertainty. We reframe open radio access network (O-RAN) near-real-time (Near-RT) control via counterfactual dynamics and a world modeling (WM) paradigm that learns an action-conditioned generative state space. This enables quantitative "what-if" forecasting beyond large language models (LLMs) as the primary modeling primitive. Actions such as physical resource blocks (PRBs) are treated as first-class control inputs in a causal world model, and both aleatoric and epistemic uncertainty are modeled for prediction and what-if analysis. An agentic, model predictive control (MPC)-based cross-entropy method (CEM) planner operates over short horizons, using prior-mean rollouts within data-driven PRB bounds to maximize a deterministic reward. The model couples multi-scale structured state-space mixtures (MS3M) with a compact stochastic latent to form WM-MS3M, summarizing key performance indicators (KPIs) histories and predicting next-step KPIs under hypothetical PRB sequences. On realistic O-RAN traces, WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference, enabling rare-event simulation and offline policy screening.

Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning

TL;DR

This work reframe open radio access network (O-RAN) near-real-time (Near-RT) control via counterfactual dynamics and a world modeling (WM) paradigm that learns an action-conditioned generative state space to enable quantitative"what-if"forecasting beyond large language models (LLMs) as the primary modeling primitive.

Abstract

We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncertainty. We reframe open radio access network (O-RAN) near-real-time (Near-RT) control via counterfactual dynamics and a world modeling (WM) paradigm that learns an action-conditioned generative state space. This enables quantitative "what-if" forecasting beyond large language models (LLMs) as the primary modeling primitive. Actions such as physical resource blocks (PRBs) are treated as first-class control inputs in a causal world model, and both aleatoric and epistemic uncertainty are modeled for prediction and what-if analysis. An agentic, model predictive control (MPC)-based cross-entropy method (CEM) planner operates over short horizons, using prior-mean rollouts within data-driven PRB bounds to maximize a deterministic reward. The model couples multi-scale structured state-space mixtures (MS3M) with a compact stochastic latent to form WM-MS3M, summarizing key performance indicators (KPIs) histories and predicting next-step KPIs under hypothetical PRB sequences. On realistic O-RAN traces, WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference, enabling rare-event simulation and offline policy screening.

Paper Structure

This paper contains 48 sections, 34 equations, 3 figures, 4 tables, 3 algorithms.

Figures (3)

  • Figure 1: Agentic world-modeling pipeline in the O-RAN Near-RT RIC. Aggregated KPIs and PRB actions from the E2 node feed the WM--MS$^{3}$M world model, which (i) provides calibrated one-step and short-horizon forecasts for factual or counterfactual PRB sequences, and (ii) serves as the dynamics backbone for an MPC/CEM planner that evaluates candidate PRB trajectories and outputs the next Near-RT control action.
  • Figure 2: Last 500 test samples: ground truth (solid) vs. one-step model prediction (dashed) for four KPIs—(a) RSRP, (b) SINR, (c) CQI, and (d) PRBs. Curves are shown in original units.
  • Figure 3: Reward comparison across PRB control scenarios over horizon $H\!=\!8$. (a) Total cumulative reward per scenario (higher is better). (b) Per-step reward trajectories showing when gains/losses occur.