Table of Contents
Fetching ...

Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras

Ashwath Shetty, Marc Habermann, Guoxing Sun, Diogo Luvizon, Vladislav Golyanik, Christian Theobalt

TL;DR

Holoported Characters introduces a real-time, free-viewpoint rendering pipeline for humans from sparse RGB cameras, achieving 4K output by integrating an explicit deformable character model, a projective texturing stage, and a texture-translation plus super-resolution refinement. The three-stage pipeline uses four input views and skeletal poses to produce faithful detail in clothing, facial expressions, and hand gestures, while maintaining multi-view consistency. Training relies on dense multi-view video and a rigged 3D scan, enabling learned priors that support high-frequency appearance without heavy hallucination. The approach delivers real-time 4K performance on multi-GPU hardware and outperforms state-of-the-art real-time methods in fidelity, while remaining suitable for immersive telepresence and merged-reality applications.

Abstract

We present the first approach to render highly realistic free-viewpoint videos of a human actor in general apparel, from sparse multi-view recording to display, in real-time at an unprecedented 4K resolution. At inference, our method only requires four camera views of the moving actor and the respective 3D skeletal pose. It handles actors in wide clothing, and reproduces even fine-scale dynamic detail, e.g. clothing wrinkles, face expressions, and hand gestures. At training time, our learning-based approach expects dense multi-view video and a rigged static surface scan of the actor. Our method comprises three main stages. Stage 1 is a skeleton-driven neural approach for high-quality capture of the detailed dynamic mesh geometry. Stage 2 is a novel solution to create a view-dependent texture using four test-time camera views as input. Finally, stage 3 comprises a new image-based refinement network rendering the final 4K image given the output from the previous stages. Our approach establishes a new benchmark for real-time rendering resolution and quality using sparse input camera views, unlocking possibilities for immersive telepresence.

Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras

TL;DR

Holoported Characters introduces a real-time, free-viewpoint rendering pipeline for humans from sparse RGB cameras, achieving 4K output by integrating an explicit deformable character model, a projective texturing stage, and a texture-translation plus super-resolution refinement. The three-stage pipeline uses four input views and skeletal poses to produce faithful detail in clothing, facial expressions, and hand gestures, while maintaining multi-view consistency. Training relies on dense multi-view video and a rigged 3D scan, enabling learned priors that support high-frequency appearance without heavy hallucination. The approach delivers real-time 4K performance on multi-GPU hardware and outperforms state-of-the-art real-time methods in fidelity, while remaining suitable for immersive telepresence and merged-reality applications.

Abstract

We present the first approach to render highly realistic free-viewpoint videos of a human actor in general apparel, from sparse multi-view recording to display, in real-time at an unprecedented 4K resolution. At inference, our method only requires four camera views of the moving actor and the respective 3D skeletal pose. It handles actors in wide clothing, and reproduces even fine-scale dynamic detail, e.g. clothing wrinkles, face expressions, and hand gestures. At training time, our learning-based approach expects dense multi-view video and a rigged static surface scan of the actor. Our method comprises three main stages. Stage 1 is a skeleton-driven neural approach for high-quality capture of the detailed dynamic mesh geometry. Stage 2 is a novel solution to create a view-dependent texture using four test-time camera views as input. Finally, stage 3 comprises a new image-based refinement network rendering the final 4K image given the output from the previous stages. Our approach establishes a new benchmark for real-time rendering resolution and quality using sparse input camera views, unlocking possibilities for immersive telepresence.
Paper Structure (39 sections, 16 equations, 20 figures, 6 tables)

This paper contains 39 sections, 16 equations, 20 figures, 6 tables.

Figures (20)

  • Figure 1: We propose Holoported Characters, a novel approach for real-time free-view point rendering of humans at 4K resolution. During inference, our method only requires four sparse images observing the human and the respective 3D skeletal pose. Then, our three-stage pipeline generates novel views of the performance in real-time and at an unprecedented resolution of 4K. We highlight that our approach can account for detailed effects such as clothing wrinkles, facial expressions, and hand gestures.
  • Figure 2: Method Overview.Holoported Characters takes sparse camera views, the respective 3D skeletal pose, and the camera parameters of the novel view as input and generates high-resolution rendering in real-time. Our character model takes the motion as input and predicts a pose-dependent deformation of the template mesh. Then, our projective texturing pipeline maps the sparse views onto this mesh’s texture space. This texture, camera encoding, and posed normal maps are then fed into our TexFeatNet, producing a view-dependent dynamic texture feature. Finally, our SRNet takes those low-resolution features in image space and generates the high-resolution rendering.
  • Figure 3: Recovery of Geometric Details. Our proposed pointcloud supervision and hand modeling helps us recover more details such as wrinkles and hand gestures compared to the baseline.
  • Figure 4: Projective Texturing. Given a sparse set of cameras and the posed character mesh, our method recovers a partial texture map using projective texturing, where pixels in screen space are mapped to texels of the texture map.
  • Figure 5: Qualitative Results for Novel Poses and Views. Our method generates high-quality renderings showing realistic wrinkle patterns and high-frequency details such as hands gestures and facial expressions. Note that our method is robust to challenging poses like squats and complicated clothing types such as loose skirts and highly textured garments, e.g. the pullover.
  • ...and 15 more figures