Table of Contents
Fetching ...

AtlasRAN: Modeling and Performance Evaluation of Open 5G Platforms for Ubiquitous Wireless Networks

Ryan Barker, Tolunay Seyfi, Alireza Ebrahimi Dorcheh, Julia Boone, Fatemeh Afghah, Joseph Boccuzzi

Abstract

Fifth-generation (5G) systems are increasingly studied as shared communication and computing infrastructure for connected vehicles, roadside edge platforms, and future unmanned-system applications. Yet results from simulators, host-OS emulators, digital twins, and hardware-in-the-loop testbeds are often compared as if timing, input/output (I/O), and control-loop behavior were equivalent across them. They are not. Consequently, apparent limits in throughput, latency, scalability, or real-time behavior may reflect the execution harness rather than the wireless design itself. This paper presents \textit{AtlasRAN}, a capability-oriented framework for modeling and performance evaluation of 5G Open Radio Access Network (O-RAN) platforms. It introduces two reference architectures, terminology that separates functional compatibility from timing fidelity, and a capability matrix that maps research questions to evaluation environments that can support them credibly. O-RAN is used here as an experimental coordinate system spanning Centralized Unit (CU)/Distributed Unit (DU) partitioning, fronthaul transport, control exposure, and core-network anchoring. We validate \textit{AtlasRAN} through a CU-DU uplink load study on a coherent CPU-GPU edge platform. For both a CPU-only baseline and a GPU-accelerated low-density parity-check decoding variant, aggregate goodput drops sharply as user count rises from 1 to 12, while fairness remains near ideal and compute utilization decreases rather than increases. This pattern indicates time-scale dilation and online I/O starvation in the emulation harness, not decoder saturation, as the dominant scaling limit. The key lesson is that timing, memory, and transport semantics must be reported as first-class experimental variables when evaluating ubiquitous 5G infrastructure.

AtlasRAN: Modeling and Performance Evaluation of Open 5G Platforms for Ubiquitous Wireless Networks

Abstract

Fifth-generation (5G) systems are increasingly studied as shared communication and computing infrastructure for connected vehicles, roadside edge platforms, and future unmanned-system applications. Yet results from simulators, host-OS emulators, digital twins, and hardware-in-the-loop testbeds are often compared as if timing, input/output (I/O), and control-loop behavior were equivalent across them. They are not. Consequently, apparent limits in throughput, latency, scalability, or real-time behavior may reflect the execution harness rather than the wireless design itself. This paper presents \textit{AtlasRAN}, a capability-oriented framework for modeling and performance evaluation of 5G Open Radio Access Network (O-RAN) platforms. It introduces two reference architectures, terminology that separates functional compatibility from timing fidelity, and a capability matrix that maps research questions to evaluation environments that can support them credibly. O-RAN is used here as an experimental coordinate system spanning Centralized Unit (CU)/Distributed Unit (DU) partitioning, fronthaul transport, control exposure, and core-network anchoring. We validate \textit{AtlasRAN} through a CU-DU uplink load study on a coherent CPU-GPU edge platform. For both a CPU-only baseline and a GPU-accelerated low-density parity-check decoding variant, aggregate goodput drops sharply as user count rises from 1 to 12, while fairness remains near ideal and compute utilization decreases rather than increases. This pattern indicates time-scale dilation and online I/O starvation in the emulation harness, not decoder saturation, as the dominant scaling limit. The key lesson is that timing, memory, and transport semantics must be reported as first-class experimental variables when evaluating ubiquitous 5G infrastructure.
Paper Structure (32 sections, 2 equations, 5 figures, 3 tables)

This paper contains 32 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Host-based CPU-centric 5G execution regimes. Top: WG4 fronthaul Split 7.2x, in which an O-DU-low hosted on general-purpose compute implements high-PHY and time-critical MAC functions and exchanges OFH/eCPRI with an O-RU implementing low-PHY/RF; the same panel also highlights software-only RF/channel abstractions used for non-real-time emulation. Bottom: Split 8, in which the full PHY remains in the host gNB/CU/DU stack and an SDR acts as the RF front end, potentially with FPGA-based assist functions. The shared 5GC, SMO/RIC, and higher-layer RAN substrate make clear that the main experimental distinction lies in the lower endpoint and thus in which timing, transport, synchronization, and interoperability constraints are actually exposed.
  • Figure 2: Heterogeneous, accelerator-aware AI-RAN workflow linking differentiable physics engines (e.g., Sionna/Sionna-RT/SRK) to code-realistic twins and deployment-grade runtimes (e.g., ARC-OTA/ACAR and AODT), enabling promote--tune--debug iteration across offline, twin, and real-time execution.
  • Figure 3: CU--DU ablation architecture and monitoring pipeline for the RFSim load study on a common NVIDIA DGX Spark host (Grace CPU + Blackwell GPU, 128 GB NVLink-C2C coherent memory). The DU runs OAI RFSim (UE/RU-PHY abstraction) and exchanges F1 control/user-plane traffic with the CU; we compare baseline OAI (vectorized CPU LDPC) against SRK (CUDA LDPC) with wall-clock timing instrumentation around the decode region. CU/DU threads are pinned to fixed CPU core sets (CU: 6--7; DU: 8--11). Uplink load is generated by iperf3 and KPIs are exported over a ZMQ monitor to an xApp-style subscriber before post-processing. This figure depicts a host-OS emulation harness intended for controlled ablations, not a fronthaul-faithful software/cyber-physical twin.
  • Figure 4: Headline result: aggregate UL goodput $T_{\mathrm{total}}$ vs. UE count $N\in\{1,3,6,12\}$ for OAI (CPU LDPC) and SRK (CUDA LDPC), with power-law overlays fitted in log space. Fits: $T_{\mathrm{OAI}}(N)\approx129.08\cdot N^{-0.7759}$ ($R^2_{\log}{=}0.9668$) and $T_{\mathrm{SRK}}(N)\approx120.07\cdot N^{-0.7406}$ ($R^2_{\log}{=}0.9446$).
  • Figure 5: Representative excerpt from the OAI DU log stream showing successive CPU LDPC decoder timing prints (extracted verbatim from du_logs.tsv; ellipses denote omitted surrounding lines). Each print reports the average wall-clock latency of nrLDPC_coding_decoder(&slot_parameters) over a rate-limited window. In the CPU backend, this call implements a MIMD$\times$SIMD strategy: code-block segments are dispatched as independent tasks onto the DU thread pool (4 workers in our runs), while each worker executes SIMD-vectorized belief-propagation (CN/VN) update kernels. The parenthesized value normalizes the measured wall-clock latency by the decoded code-block segment count; because segments are processed in parallel, this per-segment value is a convenience throughput/normalization metric rather than the serial latency of a single segment.