Table of Contents
Fetching ...

Federated Multi-View Synthesizing for Metaverse

Yiyu Guo, Zhijin Qin, Xiaoming Tao, Geoffrey Ye Li

TL;DR

The paper tackles wireless VR delivery for the metaverse by introducing a federated, 3D-aware multi-view synthesizing framework that uses single-view inputs multicast to user groups. A NeRF-inspired generator with a mapping network and a synthesis network renders multi-view content at the edge/user side, while federated learning exploits horizontal and vertical data distributions to train efficiently and privately, supplemented by federated transfer learning for rapid domain adaptation. Results show competitive quality (FID/KID) versus centralized training and notable latency reductions, with 80–120 ms frame render times on typical GPU hardware as resolution scales from 512^2 to 1024^2; the approach also reduces communication rounds via partial parameter updates and EMA-based aggregation. This work advances edge-enabled metaverse experiences by reducing bandwidth, latency, and data-sharing needs, enabling scalable, privacy-preserving VR content delivery across many devices, though it trades some fidelity for efficiency when using single-view inputs.

Abstract

The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view synthesizing framework that can efficiently provide computation, storage, and communication resources for wireless content delivery in the metaverse. We propose a three-dimensional (3D)-aware generative model that uses collections of single-view images. These single-view images are transmitted to a group of users with overlapping fields of view, which avoids massive content transmission compared to transmitting tiles or whole 3D models. We then present a federated learning approach to guarantee an efficient learning process. The training performance can be improved by characterizing the vertical and horizontal data samples with a large latent feature space, while low-latency communication can be achieved with a reduced number of transmitted parameters during federated learning. We also propose a federated transfer learning framework to enable fast domain adaptation to different target domains. Simulation results have demonstrated the effectiveness of our proposed federated multi-view synthesizing framework for VR content delivery.

Federated Multi-View Synthesizing for Metaverse

TL;DR

The paper tackles wireless VR delivery for the metaverse by introducing a federated, 3D-aware multi-view synthesizing framework that uses single-view inputs multicast to user groups. A NeRF-inspired generator with a mapping network and a synthesis network renders multi-view content at the edge/user side, while federated learning exploits horizontal and vertical data distributions to train efficiently and privately, supplemented by federated transfer learning for rapid domain adaptation. Results show competitive quality (FID/KID) versus centralized training and notable latency reductions, with 80–120 ms frame render times on typical GPU hardware as resolution scales from 512^2 to 1024^2; the approach also reduces communication rounds via partial parameter updates and EMA-based aggregation. This work advances edge-enabled metaverse experiences by reducing bandwidth, latency, and data-sharing needs, enabling scalable, privacy-preserving VR content delivery across many devices, though it trades some fidelity for efficiency when using single-view inputs.

Abstract

The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view synthesizing framework that can efficiently provide computation, storage, and communication resources for wireless content delivery in the metaverse. We propose a three-dimensional (3D)-aware generative model that uses collections of single-view images. These single-view images are transmitted to a group of users with overlapping fields of view, which avoids massive content transmission compared to transmitting tiles or whole 3D models. We then present a federated learning approach to guarantee an efficient learning process. The training performance can be improved by characterizing the vertical and horizontal data samples with a large latent feature space, while low-latency communication can be achieved with a reduced number of transmitted parameters during federated learning. We also propose a federated transfer learning framework to enable fast domain adaptation to different target domains. Simulation results have demonstrated the effectiveness of our proposed federated multi-view synthesizing framework for VR content delivery.
Paper Structure (16 sections, 24 equations, 9 figures, 1 table, 2 algorithms)

This paper contains 16 sections, 24 equations, 9 figures, 1 table, 2 algorithms.

Figures (9)

  • Figure 1: Proposed wireless VR scheme, where the content provider multicast the single-view input to users; Users apply the 3D-aware generative model to synthesize required VR content by viewports.
  • Figure 2: Proposed federated learning for model training, where datasets are categorized to horizontal and vertical; Clients upload parts of the local models by the characterizing of their dataset.
  • Figure 3: FID performance of the proposed federated multi-view synthesizing model with different image SNR.
  • Figure 4: KID performance of the proposed federated multi-view synthesizing model with different clients and dataset settings.
  • Figure 5: Results for proposed VR network. With one single-view input, the users requesting a certain range of FoV can be served. It largely reduces the transmission overhead and latency, compared with the traditional VR schemes that have to transmit multi-view content separately.
  • ...and 4 more figures