How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks
Clement Ruah, Houssem Sifaou, Osvaldo Simeone, Bashir M. Al-Hashimi
TL;DR
The paper tackles the data scarcity challenge in telecom AI by leveraging digital twins to generate site-specific data and bridge the sim-to-real gap. It proposes two core avenues: calibrating digital twins with real measurements (including differentiable simulation and phase-error-aware calibration) and robust training approaches that explicitly account for residual gaps (Bayesian environment modeling and prediction-powered inference). Three PPI-based training variants—Semi-Supervised PPI, Cross-PPI, and Context-Aware PPI—offer data-efficient strategies to mitigate simulator bias using limited real data. The work highlights practical benefits for network automation while acknowledging challenges like real-time fidelity, cross-site transfer, and continual adaptation.
Abstract
Training effective artificial intelligence models for telecommunications is challenging due to the scarcity of deployment-specific data. Real data collection is expensive, and available datasets often fail to capture the unique operational conditions and contextual variability of the network environment. Digital twinning provides a potential solution to this problem, as simulators tailored to the current network deployment can generate site-specific data to augment the available training datasets. However, there is a need to develop solutions to bridge the inherent simulation-to-reality (sim-to-real) gap between synthetic and real-world data. This paper reviews recent advances on two complementary strategies: 1) the calibration of digital twins (DTs) through real-world measurements, and 2) the use of sim-to-real gap-aware training strategies to robustly handle residual discrepancies between digital twin-generated and real data. For the latter, we evaluate two conceptually distinct methods that model the sim-to-real gap either at the level of the environment via Bayesian learning or at the level of the training loss via prediction-powered inference.
