Streamlined Transmission: A Semantic-Aware XR Deployment Framework Enhanced by Generative AI
Wanting Yang, Zehui Xiong, Tony Q. S. Quek, Xuemin Shen
TL;DR
The paper addresses the challenge of delivering immersive wireless XR in 6G under scarce radio resources. It introduces GeSa-XRF, a three-stage semantic-aware XR deployment framework augmented by Generative AI, combining a DL-based multi-modal SemCom data collection, multi-task FoV/attention analysis, and a semantic-aware multicast data delivery pipeline. Key contributions include a multi-modal SemCom framework with fusion/separation, a GAI-assisted robust data superposition scheme, MTL-based FoV and attention prediction with per-tile semantic significance, and a semantic-aware delivery strategy with dynamic transcoding and denoising guided by a three-dimensional QoE metric; a case study demonstrates improvements in PSNR and LPIPS and reduced pixel-level redundancy. The framework aims to enable scalable, personalized, and synchronized XR experiences over wireless networks by reducing transmission load and latency while maintaining immersive quality.
Abstract
In the era of 6G, featuring compelling visions of digital twins and metaverses, Extended Reality (XR) has emerged as a vital conduit connecting the digital and physical realms, garnering widespread interest. Ensuring a fully immersive wireless XR experience stands as a paramount technical necessity, demanding the liberation of XR from the confines of wired connections. In this paper, we first introduce the technologies applied in the wireless XR domain, delve into their benefits and limitations, and highlight the ongoing challenges. We then propose a novel deployment framework for a broad XR pipeline, termed "GeSa-XRF", inspired by the core philosophy of Semantic Communication (SemCom) which shifts the concern from "how" to transmit to "what" to transmit. Particularly, the framework comprises three stages: data collection, data analysis, and data delivery. In each stage, we integrate semantic awareness to achieve streamlined transmission and employ Generative Artificial Intelligence (GAI) to achieve collaborative refinements. For the data collection of multi-modal data with differentiated data volumes and heterogeneous latency requirements, we propose a novel SemCom paradigm based on multi-modal fusion and separation and a GAI-based robust superposition scheme. To perform a comprehensive data analysis, we employ multi-task learning to perform the prediction of field of view and personalized attention and discuss the possible preprocessing approaches assisted by GAI. Lastly, for the data delivery stage, we present a semantic-aware multicast-based delivery strategy aimed at reducing pixel level redundant transmissions and introduce the GAI collaborative refinement approach. The performance gain of the proposed GeSa-XRF is preliminarily demonstrated through a case study.
