Table of Contents
Fetching ...

Interactive Rendering of Relightable and Animatable Gaussian Avatars

Youyi Zhan, Tianjia Shao, He Wang, Yin Yang, Kun Zhou

TL;DR

The paper tackles the challenge of producing relightable, animatable digital humans from sparse-view data by decoupling material properties from lighting through a Relightable Gaussian Avatar built with Gaussian primitives in canonical space and forward-skinned to pose. It leverages a learnable environment map and explicit visibility/shadow estimation via fast mesh rasterization, enabling high-quality relighting under novel viewpoints, poses, and illumination at interactive rates (≈$6.9$ fps). The training is differentiable and augmented with densification and scale regularizers to stabilize material, geometry, and lighting separation, while supporting appearance editing from a single view. Empirical results on synthetic and real data show improvements in detail, shading realism, and rendering speed over strong baselines, with ablations confirming the effectiveness of visibility, densification, and pose-related components.

Abstract

Creating relightable and animatable avatars from multi-view or monocular videos is a challenging task for digital human creation and virtual reality applications. Previous methods rely on neural radiance fields or ray tracing, resulting in slow training and rendering processes. By utilizing Gaussian Splatting, we propose a simple and efficient method to decouple body materials and lighting from sparse-view or monocular avatar videos, so that the avatar can be rendered simultaneously under novel viewpoints, poses, and lightings at interactive frame rates (6.9 fps). Specifically, we first obtain the canonical body mesh using a signed distance function and assign attributes to each mesh vertex. The Gaussians in the canonical space then interpolate from nearby body mesh vertices to obtain the attributes. We subsequently deform the Gaussians to the posed space using forward skinning, and combine the learnable environment light with the Gaussian attributes for shading computation. To achieve fast shadow modeling, we rasterize the posed body mesh from dense viewpoints to obtain the visibility. Our approach is not only simple but also fast enough to allow interactive rendering of avatar animation under environmental light changes. Experiments demonstrate that, compared to previous works, our method can render higher quality results at a faster speed on both synthetic and real datasets.

Interactive Rendering of Relightable and Animatable Gaussian Avatars

TL;DR

The paper tackles the challenge of producing relightable, animatable digital humans from sparse-view data by decoupling material properties from lighting through a Relightable Gaussian Avatar built with Gaussian primitives in canonical space and forward-skinned to pose. It leverages a learnable environment map and explicit visibility/shadow estimation via fast mesh rasterization, enabling high-quality relighting under novel viewpoints, poses, and illumination at interactive rates (≈ fps). The training is differentiable and augmented with densification and scale regularizers to stabilize material, geometry, and lighting separation, while supporting appearance editing from a single view. Empirical results on synthetic and real data show improvements in detail, shading realism, and rendering speed over strong baselines, with ablations confirming the effectiveness of visibility, densification, and pose-related components.

Abstract

Creating relightable and animatable avatars from multi-view or monocular videos is a challenging task for digital human creation and virtual reality applications. Previous methods rely on neural radiance fields or ray tracing, resulting in slow training and rendering processes. By utilizing Gaussian Splatting, we propose a simple and efficient method to decouple body materials and lighting from sparse-view or monocular avatar videos, so that the avatar can be rendered simultaneously under novel viewpoints, poses, and lightings at interactive frame rates (6.9 fps). Specifically, we first obtain the canonical body mesh using a signed distance function and assign attributes to each mesh vertex. The Gaussians in the canonical space then interpolate from nearby body mesh vertices to obtain the attributes. We subsequently deform the Gaussians to the posed space using forward skinning, and combine the learnable environment light with the Gaussian attributes for shading computation. To achieve fast shadow modeling, we rasterize the posed body mesh from dense viewpoints to obtain the visibility. Our approach is not only simple but also fast enough to allow interactive rendering of avatar animation under environmental light changes. Experiments demonstrate that, compared to previous works, our method can render higher quality results at a faster speed on both synthetic and real datasets.
Paper Structure (24 sections, 13 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 13 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Pipeline overview. Starting from the canonical mesh reconstructed from SDF, Gaussians are initialized near the mesh surface. The attributes of the Gaussians are interpolated from the neighboring vertices. Then Gaussians are deformed to the posed space and rasterized to produce an image (Section \ref{['sec:gaussianavatar']}). Visibility is obtained from multi-view rendering of the posed mesh to model shadows (Section \ref{['sec:visibility']}). Through photometric loss and other constraints, the environmental light and body materials can be separated for further relighting (Section \ref{['sec:training']}).
  • Figure 2: Qualitative comparison with RA-Lin lin2024relightable, MeshAvatar chen2024meshavatar, RA-Xu xu2023relightable, IA wang2023intrinsicavatar and R4D chen2022relighting4d. We show the albedo and the relighting results under training and novel poses on both synthetic data (jody, rendered at test viewpoints) and real data (ZJU-377 and male-3-casual, rendered at training viewpoints). Compared to the baselines, our method can achieve finer body details (jody's leggings, ZJU-377's face and male-3-casual's jeans) and the specular effects that are closest to the ground truth (jody's leggings).
  • Figure 3: Ablation study on non-rigid displacement, our densification and visibility. All results are rendered under novel poses and new environment light.
  • Figure 4: Ablation study on using SMPL mesh. We present the normal and relighting results using the SMPL mesh and SDF mesh. With SMPL mesh, Gaussians may access wrong normal attributes, resulting in inaccurate relighting results.
  • Figure 5: Ablation study on scale loss. We present the rendering and relighting results from a novel viewpoint. Without scale loss, The large Gaussians create artifacts under the arms and produce a scaly appearance on the leg under novel lighting.
  • ...and 8 more figures