Table of Contents
Fetching ...

PackUV: Packed Gaussian UV Maps for 4D Volumetric Video

Aashish Rai, Angela Xing, Anushka Agarwal, Xiaoyan Cong, Zekun Li, Tao Lu, Aayush Prakash, Srinath Sridhar

TL;DR

PackUV, a novel 4D Gaussian representation that maps all Gaussian attributes into a sequence of structured, multi-scale UV atlas, enabling compact, image-native storage and surpasses existing baselines in rendering fidelity while scaling to sequences up to 30 minutes with consistent quality.

Abstract

Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inconsistency, and fail under large motions and disocclusions. Moreover, their outputs are typically incompatible with conventional video coding pipelines, preventing practical applications. We introduce PackUV, a novel 4D Gaussian representation that maps all Gaussian attributes into a sequence of structured, multi-scale UV atlas, enabling compact, image-native storage. To fit this representation from multi-view videos, we propose PackUV-GS, a temporally consistent fitting method that directly optimizes Gaussian parameters in the UV domain. A flow-guided Gaussian labeling and video keyframing module identifies dynamic Gaussians, stabilizes static regions, and preserves temporal coherence even under large motions and disocclusions. The resulting UV atlas format is the first unified volumetric video representation compatible with standard video codecs (e.g., FFV1) without losing quality, enabling efficient streaming within existing multimedia infrastructure. To evaluate long-duration volumetric capture, we present PackUV-2B, the largest multi-view video dataset to date, featuring more than 50 synchronized cameras, substantial motion, and frequent disocclusions across 100 sequences and 2B (billion) frames. Extensive experiments demonstrate that our method surpasses existing baselines in rendering fidelity while scaling to sequences up to 30 minutes with consistent quality.

PackUV: Packed Gaussian UV Maps for 4D Volumetric Video

TL;DR

PackUV, a novel 4D Gaussian representation that maps all Gaussian attributes into a sequence of structured, multi-scale UV atlas, enabling compact, image-native storage and surpasses existing baselines in rendering fidelity while scaling to sequences up to 30 minutes with consistent quality.

Abstract

Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inconsistency, and fail under large motions and disocclusions. Moreover, their outputs are typically incompatible with conventional video coding pipelines, preventing practical applications. We introduce PackUV, a novel 4D Gaussian representation that maps all Gaussian attributes into a sequence of structured, multi-scale UV atlas, enabling compact, image-native storage. To fit this representation from multi-view videos, we propose PackUV-GS, a temporally consistent fitting method that directly optimizes Gaussian parameters in the UV domain. A flow-guided Gaussian labeling and video keyframing module identifies dynamic Gaussians, stabilizes static regions, and preserves temporal coherence even under large motions and disocclusions. The resulting UV atlas format is the first unified volumetric video representation compatible with standard video codecs (e.g., FFV1) without losing quality, enabling efficient streaming within existing multimedia infrastructure. To evaluate long-duration volumetric capture, we present PackUV-2B, the largest multi-view video dataset to date, featuring more than 50 synchronized cameras, substantial motion, and frequent disocclusions across 100 sequences and 2B (billion) frames. Extensive experiments demonstrate that our method surpasses existing baselines in rendering fidelity while scaling to sequences up to 30 minutes with consistent quality.
Paper Structure (50 sections, 17 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 50 sections, 17 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: We propose a novel and compact 4D representation, PackUV, for volumetric videos that packs 3D Gaussian attributes into a sequence of 2D UV atlases (yellow, top right). PackUV is readily compatible with existing video coding infrastructure (e.g., can be coded with HEVC, FFV1). We also propose PackUV-GS, a method to directly fit Gaussian attributes from multi-view RGB videos into structured PackUV (blue, top left) via optical flow-guided keyframing and Gaussian labeling to fit arbitrary length sequences with temporal consistency even in the presence of large motions and disocclusions. The fitted scene can be rendered back to streamable volumetric video from any viewpoint (red, bottom). We also propose PackUV-2B, the largest 4D multi-view dataset containing 2B frames captured with over 50 synchronized cameras to provide 360$^\circ$ coverage.
  • Figure 2: (Top) Three UV-map organization strategies: (a) naïvely stacking UV layers (deep layers become more and more sparse); (b) a geometric-progression UV pyramid (more uniform sparsity with less storage); (c) PackUV, which packs all pyramid layers into a single UV atlas for efficient, codec-friendly processing. (Bottom) We propose PackUV-GS, a new representation based on 3DGS with a discrete spatial distribution constraint via UV fitting. It uses multiple-layer UV images to store the Gaussian attributes during 3DGS fitting. To constrain the 3D Gaussians located on the discrete rays, we propose a UV-based Adaptive Density Control. We also use a stream-based training schema based on keyframes (image with yellow border).
  • Figure 3: PackUV-GS vs. baselines for large motion and disocclusion handling. The proposed keyframing and Gaussian labeling strategy effectively manages complex scenarios, such as new objects or people entering a room and dispersing. Zoom to view better.
  • Figure 4: Optical flow. To assess long-term temporal stability, we compute optical flow between consecutive timestamps.
  • Figure 5: (Left) Compression evaluation via different methods. (Right) PSNR consistency over time.
  • ...and 7 more figures