Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Yuheng Jiang; Zhehao Shen; Yu Hong; Chengcheng Guo; Yize Wu; Yingliang Zhang; Jingyi Yu; Lan Xu

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu

TL;DR

A novel Gaussian-based approach, dubbed DualGS, for real-time and high-fidelity playback of complex human performance with excellent compression ratios, which achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame.

Abstract

Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed \textit{DualGS}, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

TL;DR

Abstract

Paper Structure (31 sections, 12 equations, 10 figures, 4 tables)

This paper contains 31 sections, 12 equations, 10 figures, 4 tables.

Introduction
Related Work
Human Performance Capture.
Neural Human Modeling.
Data Compression.
Dual-Gaussian Representation
Dual-Gaussian Initialization
Dual-Gaussian Optimization
Coarse Alignment.
Motion Prediction.
Fine-grained Optimization.
Compression
Residual Compression.
Codec Compression.
Persistent Code Book.
...and 16 more sections

Figures (10)

Figure 1: We propose a novel Dual Gaussian representation to capture challenging human performance from multi-view inputs. We first optimize joint Gaussians from a random point cloud, then use them to initialize skin Gaussians, expressing their motion through interpolation. In the following optimization, we employ a coarse-to-fine strategy, with a coarse alignment for overall motion prediction and fine-grained optimization for robust tracking and high-fidelity rendering.
Figure 2: Sampled results from our DualGS optimization pipeline. With the aid of our coarse-to-fine training strategy, we can produce high-fidelity 4D assets.
Figure 3: Illustration of our hybrid compression strategy. We compress joint Gaussian motions using residual vector quantization, encode opacity and scaling via codec compression, and represent spherical harmonics with a persistent codebook. Our approach achieves a compression ratio of up to 120-fold.
Figure 4: Examples of data captured by our multi-view system. Our DualGS dataset includes a diverse range of musical instruments from both Western and Eastern traditions.
Figure 5: Illustration of our DualGS player implementation for the seamless integration of 4D sequences into Unity and mobile platforms, enhancing real-time immersive rendering across multiple devices.
...and 5 more figures

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

TL;DR

Abstract

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Authors

TL;DR

Abstract

Table of Contents

Figures (10)