RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

Yuheng Jiang; Zhehao Shen; Chengcheng Guo; Yu Hong; Zhuo Su; Yingliang Zhang; Marc Habermann; Lan Xu

RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

Yuheng Jiang, Zhehao Shen, Chengcheng Guo, Yu Hong, Zhuo Su, Yingliang Zhang, Marc Habermann, Lan Xu

TL;DR

RePerformer tackles the challenge of delivering both faithful playback and vivid re-performance for general non-rigid, human-centric volumetric videos. It introduces a hierarchical Gaussian representation consisting of Motion Gaussians for dynamics and Appearance Gaussians for appearance, coupled with a Morton-based 2D parameterization to produce stable position/attribute maps learned via a 2D CNN/U-Net pipeline. A semantic-aware alignment module enables motion transfer across performers by associating Gaussians and applying deformation transfer to preserve topology, enabling photoreal re-performance under novel motions. Experiments on multi-view data show strong rendering fidelity and superior generalization to unseen motions while maintaining efficient training relative to per-frame optimization. This approach advances immersive playback-to-reperformance capabilities with potential impact on telepresence, film production, and VR experiences.

Abstract

Human-centric volumetric videos offer immersive free-viewpoint experiences, yet existing methods focus either on replaying general dynamic scenes or animating human avatars, limiting their ability to re-perform general dynamic scenes. In this paper, we present RePerformer, a novel Gaussian-based representation that unifies playback and re-performance for high-fidelity human-centric volumetric videos. Specifically, we hierarchically disentangle the dynamic scenes into motion Gaussians and appearance Gaussians which are associated in the canonical space. We further employ a Morton-based parameterization to efficiently encode the appearance Gaussians into 2D position and attribute maps. For enhanced generalization, we adopt 2D CNNs to map position maps to attribute maps, which can be assembled into appearance Gaussians for high-fidelity rendering of the dynamic scenes. For re-performance, we develop a semantic-aware alignment module and apply deformation transfer on motion Gaussians, enabling photo-real rendering under novel motions. Extensive experiments validate the robustness and effectiveness of RePerformer, setting a new benchmark for playback-then-reperformance paradigm in human-centric volumetric videos.

RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

TL;DR

Abstract

RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)