Federated Multi-View Synthesizing for Metaverse
Yiyu Guo, Zhijin Qin, Xiaoming Tao, Geoffrey Ye Li
TL;DR
The paper tackles wireless VR delivery for the metaverse by introducing a federated, 3D-aware multi-view synthesizing framework that uses single-view inputs multicast to user groups. A NeRF-inspired generator with a mapping network and a synthesis network renders multi-view content at the edge/user side, while federated learning exploits horizontal and vertical data distributions to train efficiently and privately, supplemented by federated transfer learning for rapid domain adaptation. Results show competitive quality (FID/KID) versus centralized training and notable latency reductions, with 80–120 ms frame render times on typical GPU hardware as resolution scales from 512^2 to 1024^2; the approach also reduces communication rounds via partial parameter updates and EMA-based aggregation. This work advances edge-enabled metaverse experiences by reducing bandwidth, latency, and data-sharing needs, enabling scalable, privacy-preserving VR content delivery across many devices, though it trades some fidelity for efficiency when using single-view inputs.
Abstract
The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view synthesizing framework that can efficiently provide computation, storage, and communication resources for wireless content delivery in the metaverse. We propose a three-dimensional (3D)-aware generative model that uses collections of single-view images. These single-view images are transmitted to a group of users with overlapping fields of view, which avoids massive content transmission compared to transmitting tiles or whole 3D models. We then present a federated learning approach to guarantee an efficient learning process. The training performance can be improved by characterizing the vertical and horizontal data samples with a large latent feature space, while low-latency communication can be achieved with a reduced number of transmitted parameters during federated learning. We also propose a federated transfer learning framework to enable fast domain adaptation to different target domains. Simulation results have demonstrated the effectiveness of our proposed federated multi-view synthesizing framework for VR content delivery.
