Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

Yiqun Zhao; Chenming Wu; Binbin Huang; Yihao Zhi; Chen Zhao; Jingdong Wang; Shenghua Gao

Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, Shenghua Gao

TL;DR

This work tackles the problem of fast, relightable dynamic clothed human reconstruction from monocular video by introducing SGIA, a surfel-based Gaussian inverse avatar. SGIA combines a canonical PBR-aware 2D Gaussian Splatting representation with SMPL-driven articulation and latent bones, enabling efficient forward rendering with image-based lighting and a split-sum based pre-integrated lighting model. The method introduces an occlusion approximation using the SMPL mesh and a progressive training strategy to jointly recover geometry and physiologically-based rendering properties (albedo, roughness, metallic) under unknown illumination. Experiments on synthetic and real datasets demonstrate significant speedups (training ~40 minutes, rendering ~5 FPS) while achieving competitive PBR property estimation and realistic relighting under novel poses and illuminations, highlighting its practical potential for virtual production and real-time applications.

Abstract

Efficient and accurate reconstruction of a relightable, dynamic clothed human avatar from a monocular video is crucial for the entertainment industry. This paper presents SGIA (Surfel-based Gaussian Inverse Avatar), which introduces efficient training and rendering for relightable dynamic human reconstruction. SGIA advances previous Gaussian Avatar methods by comprehensively modeling Physically-Based Rendering (PBR) properties for clothed human avatars, allowing for the manipulation of avatars into novel poses under diverse lighting conditions. Specifically, our approach integrates pre-integration and image-based lighting for fast light calculations that surpass the performance of existing implicit-based techniques. To address challenges related to material lighting disentanglement and accurate geometry reconstruction, we propose an innovative occlusion approximation strategy and a progressive training approach. Extensive experiments demonstrate that SGIA not only achieves highly accurate physical properties but also significantly enhances the realistic relighting of dynamic human avatars, providing a substantial speed advantage. We exhibit more results in our project page: https://GS-IA.github.io.

Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

TL;DR

Abstract

Paper Structure (30 sections, 23 equations, 14 figures, 7 tables)

This paper contains 30 sections, 23 equations, 14 figures, 7 tables.

Introduction
Related Work
Inverse rendering for static scenes
Clothed human avatar modeling
PBR properties reconstruction of clothed human avatars
Our Method
A revisit of 2DGS
Clothed humans avatars as dynamic surfels animated by template models
Physically-Based Rendering with image-based Lighting
Reconstruct animatable PBR-aware 2DGS from monocular videos
Novel pose animation under novel illuminations
Experiments
Evaluation datasets
Baselines
Evaluation Metrics
...and 15 more sections

Figures (14)

Figure 1: We achieve fast reconstruction of clothed human avatars with PBR properties from a monocular video. SGIA takes a monocular video and initial human pose and shape as input to estimate dynamic clothed humans' PBR properties, including geometry and materials. Leveraging a PBR-aware 2DGS representation, our method enables fast training and rendering processes. By utilizing the estimated PBR properties, we can not only deform the avatars into different poses but also render them with realistic lighting conditions, allowing for versatile and visually appealing outputs.
Figure 2: Overview of our pipeline. We define the radiance ($c_k$) and materials ($a_k, r_k, m_k$) at the canonical space as canonical PBR-aware 2DGS and deform them to the world space via Linear Blend Skinning (LBS). We first optimize Image reconstruction loss to get the initial shape of the clothed avatar. Based on the rough shape, we optimize the Gaussian attributes with Physically-Based Rendering. To model the shadow effect and decouple the materials from lighting, we propose to approximate the calculation of occlusion with the template mesh. Additionally, we apply regularization, i.e., smooth loss, white loss, and normal consistency loss to get a plausible solution for the PBR materials.
Figure 3: Visualization of our PBR optimization stage. At the initial stage, both the D-normal $\mathbf{N}$ (Normal from depth points) and the R-normal (Rendered Splat Normal) are not accurate. After the Geometry Fix Stage ($\lambda_{\text{NC}_2} = 0, \lambda_{\text{NC}_1}=1$). The Geometry is repaired while the splat normal is still messy. We then set ( $\lambda_{\text{NC}_1} = 0, \lambda_{\text{NC}_2}=1$), and finally achieve that both the geometry and splat normal are consistent and accurate.
Figure 4: Qualitative results on the RANA dataset. We visualize the PBR image, albedo, and normal from the training view. We then relight the clothed human avatars at 2 different novel poses under the novel illuminations. As highlighted by the red rectangles, IntrinsicsAvatar wang2024intrinsicavatar may optimize the fake normal to fit the appearance, while our method provides a more realistic normal for better relighting.
Figure 5: Qualitative comparisons on the PeopleSnapshot dataset. We visualize the PBR image, albedo, and normal from the training view. We then relight the clothed human avatars at training pose and novel pose under the novel illumination.
...and 9 more figures

Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

TL;DR

Abstract

Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

Authors

TL;DR

Abstract

Table of Contents

Figures (14)