Twinning Commercial Network Traces on Experimental Open RAN Platforms
Leonardo Bonati, Ravis Shirkhani, Claudio Fiandrino, Stefano Maxenti, Salvatore D'Oro, Michele Polese, Tommaso Melodia
TL;DR
Public mobile network datasets are scarce, hindering data-driven development of Open RAN. The paper introduces a traffic-twinning pipeline that characterizes real traces and reproduces them on Colosseum Open RAN via mgen-based flows, using a granularity of $1$ s and aggregation window $W$ to generate statistically equivalent traffic. The authors publicly release a dataset with over $500$ hours and more than $450$ GB of cross-layer KPIs and protocol logs across LTE-like deployments, plus an extensive validation showing end-to-end performance depends on cross-layer interactions beyond PHY/MAC. This work enables data augmentation, training of xApps/rApps/dApps, and robust evaluation of Open RAN control policies under realistic, dynamic traffic conditions.
Abstract
While the availability of large datasets has been instrumental to advance fields like computer vision and natural language processing, this has not been the case in mobile networking. Indeed, mobile traffic data is often unavailable due to privacy or regulatory concerns. This problem becomes especially relevant in Open Radio Access Network (RAN), where artificial intelligence can potentially drive optimization and control of the RAN, but still lags behind due to the lack of training datasets. While substantial work has focused on developing testbeds that can accurately reflect production environments, the same level of effort has not been put into twinning the traffic that traverse such networks. To fill this gap, in this paper, we design a methodology to twin real-world cellular traffic traces in experimental Open RAN testbeds. We demonstrate our approach on the Colosseum Open RAN digital twin, and publicly release a large dataset (more than 500 hours and 450 GB) with PHY-, MAC-, and App-layer Key Performance Measurements (KPMs), and protocol stack logs. Our analysis shows that our dataset can be used to develop and evaluate a number of Open RAN use cases, including those with strict latency requirements.
