Detail Enhanced Gaussian Splatting for Large-Scale Volumetric Capture

Julien Philip; Li Ma; Pascal Clausen; Wenqi Xian; Ahmet Levent Taşel; Mingming He; Xueming Yu; David M. George; Ning Yu; Oliver Pilarski; Paul Debevec

Detail Enhanced Gaussian Splatting for Large-Scale Volumetric Capture

Julien Philip, Li Ma, Pascal Clausen, Wenqi Xian, Ahmet Levent Taşel, Mingming He, Xueming Yu, David M. George, Ning Yu, Oliver Pilarski, Paul Debevec

TL;DR

This work introduces a two-rig, large-scale 4D volumetric capture pipeline combining Poly4DGS dynamic Gaussian splatting with a diffusion-based detail enhancement to produce production-quality 4K facial closeups. It tackles the gap between scalable scene capture and high-resolution rendering by (i) capturing multi-actor performances with a Scene Rig, (ii) capturing actor-specific facial detail with a Face Rig, and (iii) training a diffusion model on paired low/high-quality GS data to add fine details and restore alpha. The approach achieves improved temporal stability and detail fidelity, validated through ablations and comparisons, enabling realistic free-viewpoint video suitable for film and television production. However, the method requires substantial hardware and computing resources, and some temporal artifacts persist, motivating future work in relighting and improved eye reflections.

Abstract

We present a unique system for large-scale, multi-performer, high resolution 4D volumetric capture providing realistic free-viewpoint video up to and including 4K resolution facial closeups. To achieve this, we employ a novel volumetric capture, reconstruction and rendering pipeline based on Dynamic Gaussian Splatting and Diffusion-based Detail Enhancement. We design our pipeline specifically to meet the demands of high-end media production. We employ two capture rigs: the Scene Rig, which captures multi-actor performances at a resolution which falls short of 4K production quality, and the Face Rig, which records high-fidelity single-actor facial detail to serve as a reference for detail enhancement. We first reconstruct dynamic performances from the Scene Rig using 4D Gaussian Splatting, incorporating new model designs and training strategies to improve reconstruction, dynamic range, and rendering quality. Then to render high-quality images for facial closeups, we introduce a diffusion-based detail enhancement model. This model is fine-tuned with high-fidelity data from the same actors recorded in the Face Rig. We train on paired data generated from low- and high-quality Gaussian Splatting (GS) models, using the low-quality input to match the quality of the Scene Rig, with the high-quality GS as ground truth. Our results demonstrate the effectiveness of this pipeline in bridging the gap between the scalable performance capture of a large-scale rig and the high-resolution standards required for film and media production.

Detail Enhanced Gaussian Splatting for Large-Scale Volumetric Capture

TL;DR

Abstract

Detail Enhanced Gaussian Splatting for Large-Scale Volumetric Capture

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)