Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras
Ashwath Shetty, Marc Habermann, Guoxing Sun, Diogo Luvizon, Vladislav Golyanik, Christian Theobalt
TL;DR
Holoported Characters introduces a real-time, free-viewpoint rendering pipeline for humans from sparse RGB cameras, achieving 4K output by integrating an explicit deformable character model, a projective texturing stage, and a texture-translation plus super-resolution refinement. The three-stage pipeline uses four input views and skeletal poses to produce faithful detail in clothing, facial expressions, and hand gestures, while maintaining multi-view consistency. Training relies on dense multi-view video and a rigged 3D scan, enabling learned priors that support high-frequency appearance without heavy hallucination. The approach delivers real-time 4K performance on multi-GPU hardware and outperforms state-of-the-art real-time methods in fidelity, while remaining suitable for immersive telepresence and merged-reality applications.
Abstract
We present the first approach to render highly realistic free-viewpoint videos of a human actor in general apparel, from sparse multi-view recording to display, in real-time at an unprecedented 4K resolution. At inference, our method only requires four camera views of the moving actor and the respective 3D skeletal pose. It handles actors in wide clothing, and reproduces even fine-scale dynamic detail, e.g. clothing wrinkles, face expressions, and hand gestures. At training time, our learning-based approach expects dense multi-view video and a rigged static surface scan of the actor. Our method comprises three main stages. Stage 1 is a skeleton-driven neural approach for high-quality capture of the detailed dynamic mesh geometry. Stage 2 is a novel solution to create a view-dependent texture using four test-time camera views as input. Finally, stage 3 comprises a new image-based refinement network rendering the final 4K image given the output from the previous stages. Our approach establishes a new benchmark for real-time rendering resolution and quality using sparse input camera views, unlocking possibilities for immersive telepresence.
