Table of Contents
Fetching ...

AI-Driven Innovations in Volumetric Video Streaming: A Review

Erfan Entezami, Hui Guan

TL;DR

This paper analyzes AI-driven approaches to volumetric video streaming, focusing on how to efficiently transmit and render 6-DoF content represented as point clouds, NeRF, or 3D Gaussian splatting. It proposes a taxonomy distinguishing explicit/implicit and learnable/fixed representations, and surveys state-of-the-art techniques for each representation: viewport- and quality-based strategies for point clouds; time-aware, deformation-based, and grid-based NeRF methods with rendering accelerations; and motion-tracking and deformation-based extensions for 3DGS. Key contributions include synthesizing challenges and proposing future directions like robust motion handling, edge-device acceleration, and scalable long-sequence streaming. The insights are relevant for researchers and practitioners aiming to deploy volumetric streaming in immersive applications and future networks.

Abstract

Recent efforts to enhance immersive and interactive user experiences have driven the development of volumetric video, a form of 3D content that enables 6 DoF. Unlike traditional 2D content, volumetric content can be represented in various ways, such as point clouds, meshes, or neural representations. However, due to its complex structure and large amounts of data size, deploying this new form of 3D data presents significant challenges in transmission and rendering. These challenges have hindered the widespread adoption of volumetric video in daily applications. In recent years, researchers have proposed various AI-driven techniques to address these challenges and improve the efficiency and quality of volumetric content streaming. This paper provides a comprehensive overview of recent advances in AI-driven approaches to facilitate volumetric content streaming. Through this review, we aim to offer insights into the current state-of-the-art and suggest potential future directions for advancing the deployment of volumetric video streaming in real-world applications.

AI-Driven Innovations in Volumetric Video Streaming: A Review

TL;DR

This paper analyzes AI-driven approaches to volumetric video streaming, focusing on how to efficiently transmit and render 6-DoF content represented as point clouds, NeRF, or 3D Gaussian splatting. It proposes a taxonomy distinguishing explicit/implicit and learnable/fixed representations, and surveys state-of-the-art techniques for each representation: viewport- and quality-based strategies for point clouds; time-aware, deformation-based, and grid-based NeRF methods with rendering accelerations; and motion-tracking and deformation-based extensions for 3DGS. Key contributions include synthesizing challenges and proposing future directions like robust motion handling, edge-device acceleration, and scalable long-sequence streaming. The insights are relevant for researchers and practitioners aiming to deploy volumetric streaming in immersive applications and future networks.

Abstract

Recent efforts to enhance immersive and interactive user experiences have driven the development of volumetric video, a form of 3D content that enables 6 DoF. Unlike traditional 2D content, volumetric content can be represented in various ways, such as point clouds, meshes, or neural representations. However, due to its complex structure and large amounts of data size, deploying this new form of 3D data presents significant challenges in transmission and rendering. These challenges have hindered the widespread adoption of volumetric video in daily applications. In recent years, researchers have proposed various AI-driven techniques to address these challenges and improve the efficiency and quality of volumetric content streaming. This paper provides a comprehensive overview of recent advances in AI-driven approaches to facilitate volumetric content streaming. Through this review, we aim to offer insights into the current state-of-the-art and suggest potential future directions for advancing the deployment of volumetric video streaming in real-world applications.

Paper Structure

This paper contains 19 sections, 5 equations, 8 figures.

Figures (8)

  • Figure 1: The organization of this review.
  • Figure 2: Visualization of volumetric content representations, categorized as implicit or explicit, and learnable or fixed.
  • Figure 3: Main pipeline of NeRF. In step (a) position and direction of points are sent to the MLP model to predict color and density, and in (b) those predicted values are integrated using volume rendering techniques to achieve the final color. The image is taken from mildenhall2021nerf.
  • Figure 4: The main training pipeline for 3DGS. image is taken from kerbl20233d.
  • Figure 5: An example of time-aware methods that generates a 2D view of a dynamic scene based on the position, viewing direction, and a compact time-varying latent code. Image is adapted from li2022neural.
  • ...and 3 more figures