Table of Contents
Fetching ...

RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

Yuheng Jiang, Zhehao Shen, Chengcheng Guo, Yu Hong, Zhuo Su, Yingliang Zhang, Marc Habermann, Lan Xu

TL;DR

RePerformer tackles the challenge of delivering both faithful playback and vivid re-performance for general non-rigid, human-centric volumetric videos. It introduces a hierarchical Gaussian representation consisting of Motion Gaussians for dynamics and Appearance Gaussians for appearance, coupled with a Morton-based 2D parameterization to produce stable position/attribute maps learned via a 2D CNN/U-Net pipeline. A semantic-aware alignment module enables motion transfer across performers by associating Gaussians and applying deformation transfer to preserve topology, enabling photoreal re-performance under novel motions. Experiments on multi-view data show strong rendering fidelity and superior generalization to unseen motions while maintaining efficient training relative to per-frame optimization. This approach advances immersive playback-to-reperformance capabilities with potential impact on telepresence, film production, and VR experiences.

Abstract

Human-centric volumetric videos offer immersive free-viewpoint experiences, yet existing methods focus either on replaying general dynamic scenes or animating human avatars, limiting their ability to re-perform general dynamic scenes. In this paper, we present RePerformer, a novel Gaussian-based representation that unifies playback and re-performance for high-fidelity human-centric volumetric videos. Specifically, we hierarchically disentangle the dynamic scenes into motion Gaussians and appearance Gaussians which are associated in the canonical space. We further employ a Morton-based parameterization to efficiently encode the appearance Gaussians into 2D position and attribute maps. For enhanced generalization, we adopt 2D CNNs to map position maps to attribute maps, which can be assembled into appearance Gaussians for high-fidelity rendering of the dynamic scenes. For re-performance, we develop a semantic-aware alignment module and apply deformation transfer on motion Gaussians, enabling photo-real rendering under novel motions. Extensive experiments validate the robustness and effectiveness of RePerformer, setting a new benchmark for playback-then-reperformance paradigm in human-centric volumetric videos.

RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

TL;DR

RePerformer tackles the challenge of delivering both faithful playback and vivid re-performance for general non-rigid, human-centric volumetric videos. It introduces a hierarchical Gaussian representation consisting of Motion Gaussians for dynamics and Appearance Gaussians for appearance, coupled with a Morton-based 2D parameterization to produce stable position/attribute maps learned via a 2D CNN/U-Net pipeline. A semantic-aware alignment module enables motion transfer across performers by associating Gaussians and applying deformation transfer to preserve topology, enabling photoreal re-performance under novel motions. Experiments on multi-view data show strong rendering fidelity and superior generalization to unseen motions while maintaining efficient training relative to per-frame optimization. This approach advances immersive playback-to-reperformance capabilities with potential impact on telepresence, film production, and VR experiences.

Abstract

Human-centric volumetric videos offer immersive free-viewpoint experiences, yet existing methods focus either on replaying general dynamic scenes or animating human avatars, limiting their ability to re-perform general dynamic scenes. In this paper, we present RePerformer, a novel Gaussian-based representation that unifies playback and re-performance for high-fidelity human-centric volumetric videos. Specifically, we hierarchically disentangle the dynamic scenes into motion Gaussians and appearance Gaussians which are associated in the canonical space. We further employ a Morton-based parameterization to efficiently encode the appearance Gaussians into 2D position and attribute maps. For enhanced generalization, we adopt 2D CNNs to map position maps to attribute maps, which can be assembled into appearance Gaussians for high-fidelity rendering of the dynamic scenes. For re-performance, we develop a semantic-aware alignment module and apply deformation transfer on motion Gaussians, enabling photo-real rendering under novel motions. Extensive experiments validate the robustness and effectiveness of RePerformer, setting a new benchmark for playback-then-reperformance paradigm in human-centric volumetric videos.

Paper Structure

This paper contains 14 sections, 9 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: We introduce , a Gaussian-based approach for robust, high-fidelity volumetric video playback and realistic re-performance of general dynamic scenes. (Left: Sequential bass performances. Right: Synchronized violin players.)
  • Figure 2: Overview of . Our method disentangles compact motion Gaussians and dense appearance Gaussians, repacking the appearance Gaussians positions into 2D maps for network regression. (a) We use Morton parameterization to project the canonical appearance Gaussians onto UV space, forming a consistent $i \rightarrow (u,v)$ mapping. (b) feeds the 2D position map after Morton-based parameterization into three power 2D CNNs with a self-attention layer to regress the corresponding attribute maps.
  • Figure 3: The re-performance pipeline consists of two components: template alignment and motion transfer.
  • Figure 4: Gallery of our results. delivers high-fidelity rendering of human performance in challenging motions and achieves vivid re-performance across various complex human-object scenarios.
  • Figure 5: Qualitative comparison with SOTA playback-only methods on novel view synthesis on DualGS dataset.
  • ...and 7 more figures