Table of Contents
Fetching ...

Twinning Commercial Network Traces on Experimental Open RAN Platforms

Leonardo Bonati, Ravis Shirkhani, Claudio Fiandrino, Stefano Maxenti, Salvatore D'Oro, Michele Polese, Tommaso Melodia

TL;DR

Public mobile network datasets are scarce, hindering data-driven development of Open RAN. The paper introduces a traffic-twinning pipeline that characterizes real traces and reproduces them on Colosseum Open RAN via mgen-based flows, using a granularity of $1$ s and aggregation window $W$ to generate statistically equivalent traffic. The authors publicly release a dataset with over $500$ hours and more than $450$ GB of cross-layer KPIs and protocol logs across LTE-like deployments, plus an extensive validation showing end-to-end performance depends on cross-layer interactions beyond PHY/MAC. This work enables data augmentation, training of xApps/rApps/dApps, and robust evaluation of Open RAN control policies under realistic, dynamic traffic conditions.

Abstract

While the availability of large datasets has been instrumental to advance fields like computer vision and natural language processing, this has not been the case in mobile networking. Indeed, mobile traffic data is often unavailable due to privacy or regulatory concerns. This problem becomes especially relevant in Open Radio Access Network (RAN), where artificial intelligence can potentially drive optimization and control of the RAN, but still lags behind due to the lack of training datasets. While substantial work has focused on developing testbeds that can accurately reflect production environments, the same level of effort has not been put into twinning the traffic that traverse such networks. To fill this gap, in this paper, we design a methodology to twin real-world cellular traffic traces in experimental Open RAN testbeds. We demonstrate our approach on the Colosseum Open RAN digital twin, and publicly release a large dataset (more than 500 hours and 450 GB) with PHY-, MAC-, and App-layer Key Performance Measurements (KPMs), and protocol stack logs. Our analysis shows that our dataset can be used to develop and evaluate a number of Open RAN use cases, including those with strict latency requirements.

Twinning Commercial Network Traces on Experimental Open RAN Platforms

TL;DR

Public mobile network datasets are scarce, hindering data-driven development of Open RAN. The paper introduces a traffic-twinning pipeline that characterizes real traces and reproduces them on Colosseum Open RAN via mgen-based flows, using a granularity of s and aggregation window to generate statistically equivalent traffic. The authors publicly release a dataset with over hours and more than GB of cross-layer KPIs and protocol logs across LTE-like deployments, plus an extensive validation showing end-to-end performance depends on cross-layer interactions beyond PHY/MAC. This work enables data augmentation, training of xApps/rApps/dApps, and robust evaluation of Open RAN control policies under realistic, dynamic traffic conditions.

Abstract

While the availability of large datasets has been instrumental to advance fields like computer vision and natural language processing, this has not been the case in mobile networking. Indeed, mobile traffic data is often unavailable due to privacy or regulatory concerns. This problem becomes especially relevant in Open Radio Access Network (RAN), where artificial intelligence can potentially drive optimization and control of the RAN, but still lags behind due to the lack of training datasets. While substantial work has focused on developing testbeds that can accurately reflect production environments, the same level of effort has not been put into twinning the traffic that traverse such networks. To fill this gap, in this paper, we design a methodology to twin real-world cellular traffic traces in experimental Open RAN testbeds. We demonstrate our approach on the Colosseum Open RAN digital twin, and publicly release a large dataset (more than 500 hours and 450 GB) with PHY-, MAC-, and App-layer Key Performance Measurements (KPMs), and protocol stack logs. Our analysis shows that our dataset can be used to develop and evaluate a number of Open RAN use cases, including those with strict latency requirements.
Paper Structure (10 sections, 10 figures, 4 tables)

This paper contains 10 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Pipeline to twin traffic traces from real-world datasets.
  • Figure 2: Example of clustering of traffic of a production BS operating with a 20 MHz channel bandwidth to identify slicing profiles.
  • Figure 3: Snapshot of traffic of a production BS. We report traffic load, average number of UEs and cluster for windows of $W=1$ minute.
  • Figure 4: CDF of MAC-layer downlink throughput of eMBB UEs for different slicing configurations.
  • Figure 5: Bar plot of MAC- and App-layer downlink throughput of eMBB UEs for different slicing configurations.
  • ...and 5 more figures