Table of Contents
Fetching ...

A Fast Volumetric Capture and Reconstruction Pipeline for Dynamic Point Clouds and Gaussian Splats

Athanasios Charisoudis, Simone Croci, Lam Kit Yung, Pascal Frossard, Aljosa Smolic

TL;DR

The paper presents a fast, modular volumetric capture system that works with RGB-D or RGB-only input and outputs both point clouds and Gaussian splats. It integrates a unified, GPU-accelerated preprocessing stage with two parallel reconstruction backends, introducing world-frame Gaussian rotation re-parameterization and targeted fine-tuning of GPS-Gaussian to improve robustness across camera configurations. Outputs are provided in standard formats (PLY, MPEG V-PCC) and SPLAT, with web-based viewers and Unity/Unreal plugins enabling on-site previews at 5–10 FPS. The work emphasizes deployability, open-source release, and practical applicability in unconstrained environments, offering a scalable path for real-time volumetric reconstruction with commodity hardware.

Abstract

We present a fast and efficient volumetric capture and reconstruction system that processes either RGB-D or RGB-only input to generate 3D representations in the form of point clouds and Gaussian splats. For Gaussian splat reconstructions, we took the GPS-Gaussian regressor and improved it, enabling high-quality reconstructions with minimal overhead. The system is designed for easy setup and deployment, supporting in-the-wild operation under uncontrolled illumination and arbitrary backgrounds, as well as flexible camera configurations, including sparse setups, arbitrary camera numbers and baselines. Captured data can be exported in standard formats such as PLY, MPEG V-PCC, and SPLAT, and visualized through a web-based viewer or Unity/Unreal plugins. A live on-location preview of both input and reconstruction is available at 5-10 FPS. We present qualitative findings focused on deployability and targeted ablations. The complete framework is open-source, facilitating reproducibility and further research.

A Fast Volumetric Capture and Reconstruction Pipeline for Dynamic Point Clouds and Gaussian Splats

TL;DR

The paper presents a fast, modular volumetric capture system that works with RGB-D or RGB-only input and outputs both point clouds and Gaussian splats. It integrates a unified, GPU-accelerated preprocessing stage with two parallel reconstruction backends, introducing world-frame Gaussian rotation re-parameterization and targeted fine-tuning of GPS-Gaussian to improve robustness across camera configurations. Outputs are provided in standard formats (PLY, MPEG V-PCC) and SPLAT, with web-based viewers and Unity/Unreal plugins enabling on-site previews at 5–10 FPS. The work emphasizes deployability, open-source release, and practical applicability in unconstrained environments, offering a scalable path for real-time volumetric reconstruction with commodity hardware.

Abstract

We present a fast and efficient volumetric capture and reconstruction system that processes either RGB-D or RGB-only input to generate 3D representations in the form of point clouds and Gaussian splats. For Gaussian splat reconstructions, we took the GPS-Gaussian regressor and improved it, enabling high-quality reconstructions with minimal overhead. The system is designed for easy setup and deployment, supporting in-the-wild operation under uncontrolled illumination and arbitrary backgrounds, as well as flexible camera configurations, including sparse setups, arbitrary camera numbers and baselines. Captured data can be exported in standard formats such as PLY, MPEG V-PCC, and SPLAT, and visualized through a web-based viewer or Unity/Unreal plugins. A live on-location preview of both input and reconstruction is available at 5-10 FPS. We present qualitative findings focused on deployability and targeted ablations. The complete framework is open-source, facilitating reproducibility and further research.

Paper Structure

This paper contains 15 sections, 16 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Left: capture in action. Right: pipeline overview (data flow from left to right). Per-camera RGB-D/RGB inputs are processed in parallel; vertical bars indicate synchronization points. Segmentation and depth processing run in parallel, then point-cloud and Gaussian-splat reconstructions are computed in parallel. A live monitor shows reconstruction previews at 5 to 10 FPS, cycling across input cameras; after processing, teaser clips and reconstructions are served in the web viewer.
  • Figure 2: Color cues for two sample image pairs. From left to right: color, mask, optical flow, disparity from RAFT-Stereo raftstereo and from FoundationStereo foundationstereo. RAFT-Stereo was trained on human data only, predicting more accurate disparity ranges.
  • Figure 3: Illustration of how the bilateral spatiotemporal filter is applied to two consecutive frames. The stripes deonote valid depth values, while the white regions denote holes.
  • Figure 4: Sensor depth (left block): raw, spatially filtered (BS), and spatio-temporally filtered (BS+T). Stereo-estimated depth (right block) is computed from rectified pairs and shown without bilateral filtering.
  • Figure 5: Sample outputs of the dynamic reconstruction pipeline. We randomly select five timesteps for four subjects from the output of the pipeline. For each subject point clouds and Gaussian splat reconstructions are given in triplets, while we visualize three views per triplet.
  • ...and 3 more figures