Table of Contents
Fetching ...

SAGA: Surface-Aligned Gaussian Avatar

Ronghan Chen, Yang Cong, Jiayue Liu

TL;DR

This work tackles monocular dynamic human reconstruction by introducing SAGA, a two-stage Surface-Aligned Gaussian Avatar. The method first adheres Gaussians to a coarse SMPL mesh to enforce well-defined geometry and then detaches them to capture fine deformations, aided by Gaussian–Mesh alignment regularization and a Walking-on-Mesh strategy to keep triangle bindings accurate. The approach delivers state-of-the-art novel-view and novel-pose synthesis with fast training (~12 minutes) and real-time rendering (60 FPS), and enables direct high-quality mesh extraction from deformable Gaussians learned from monocular videos. By combining mesh regularization with expressive Gaussians and pose-driven colorization, SAGA achieves superior geometric fidelity and rendering realism in challenging monocular settings, marking a significant advance for practical, photorealistic avatar apps.

Abstract

This paper presents a Surface-Aligned Gaussian representation for creating animatable human avatars from monocular videos,aiming at improving the novel view and pose synthesis performance while ensuring fast training and real-time rendering. Recently,3DGS has emerged as a more efficient and expressive alternative to NeRF, and has been used for creating dynamic human avatars. However,when applied to the severely ill-posed task of monocular dynamic reconstruction, the Gaussians tend to overfit the constantly changing regions such as clothes wrinkles or shadows since these regions cannot provide consistent supervision, resulting in noisy geometry and abrupt deformation that typically fail to generalize under novel views and poses.To address these limitations, we present SAGA,i.e.,Surface-Aligned Gaussian Avatar,which aligns the Gaussians with a mesh to enforce well-defined geometry and consistent deformation, thereby improving generalization under novel views and poses. Unlike existing strict alignment methods that suffer from limited expressive power and low realism,SAGA employs a two-stage alignment strategy where the Gaussians are first adhered on while then detached from the mesh, thus facilitating both good geometry and high expressivity. In the Adhered Stage, we improve the flexibility of Adhered-on-Mesh Gaussians by allowing them to flow on the mesh, in contrast to existing methods that rigidly bind Gaussians to fixed location. In the second Detached Stage, we introduce a Gaussian-Mesh Alignment regularization, which allows us to unleash the expressivity by detaching the Gaussians but maintain the geometric alignment by minimizing their location and orientation offsets from the bound triangles. Finally, since the Gaussians may drift outside the bound triangles during optimization, an efficient Walking-on-Mesh strategy is proposed to dynamically update the bound triangles.

SAGA: Surface-Aligned Gaussian Avatar

TL;DR

This work tackles monocular dynamic human reconstruction by introducing SAGA, a two-stage Surface-Aligned Gaussian Avatar. The method first adheres Gaussians to a coarse SMPL mesh to enforce well-defined geometry and then detaches them to capture fine deformations, aided by Gaussian–Mesh alignment regularization and a Walking-on-Mesh strategy to keep triangle bindings accurate. The approach delivers state-of-the-art novel-view and novel-pose synthesis with fast training (~12 minutes) and real-time rendering (60 FPS), and enables direct high-quality mesh extraction from deformable Gaussians learned from monocular videos. By combining mesh regularization with expressive Gaussians and pose-driven colorization, SAGA achieves superior geometric fidelity and rendering realism in challenging monocular settings, marking a significant advance for practical, photorealistic avatar apps.

Abstract

This paper presents a Surface-Aligned Gaussian representation for creating animatable human avatars from monocular videos,aiming at improving the novel view and pose synthesis performance while ensuring fast training and real-time rendering. Recently,3DGS has emerged as a more efficient and expressive alternative to NeRF, and has been used for creating dynamic human avatars. However,when applied to the severely ill-posed task of monocular dynamic reconstruction, the Gaussians tend to overfit the constantly changing regions such as clothes wrinkles or shadows since these regions cannot provide consistent supervision, resulting in noisy geometry and abrupt deformation that typically fail to generalize under novel views and poses.To address these limitations, we present SAGA,i.e.,Surface-Aligned Gaussian Avatar,which aligns the Gaussians with a mesh to enforce well-defined geometry and consistent deformation, thereby improving generalization under novel views and poses. Unlike existing strict alignment methods that suffer from limited expressive power and low realism,SAGA employs a two-stage alignment strategy where the Gaussians are first adhered on while then detached from the mesh, thus facilitating both good geometry and high expressivity. In the Adhered Stage, we improve the flexibility of Adhered-on-Mesh Gaussians by allowing them to flow on the mesh, in contrast to existing methods that rigidly bind Gaussians to fixed location. In the second Detached Stage, we introduce a Gaussian-Mesh Alignment regularization, which allows us to unleash the expressivity by detaching the Gaussians but maintain the geometric alignment by minimizing their location and orientation offsets from the bound triangles. Finally, since the Gaussians may drift outside the bound triangles during optimization, an efficient Walking-on-Mesh strategy is proposed to dynamically update the bound triangles.

Paper Structure

This paper contains 42 sections, 25 equations, 17 figures, 7 tables, 1 algorithm.

Figures (17)

  • Figure 1: UPPER: Illustration of SAGA, i.e. Surface-aligned Gaussian Avatar for monocular drivable avatar reconstruction and animation. LOWER: Since monocular dynamic reconstruction is severely ill-posed, state-of-the-art methods either (a) overfit the scene with naive Gaussians or (b) overconstrain the Gaussians by fixing them on the mesh. In contrast, (c) SAGA applies a first-adhered-then-detached manner to effectively regularize Gaussians without sacrificing the expressivity, (d) leading to more photorealistic rendering results.
  • Figure 2: The framework of Surface-aligned Gaussian Avatar (SAGA). We model the human with a two-stage Surface-Aligned Gaussian representation in the canonical space, where the Gaussians are first strictly adhered on the SMPL mesh (Stage 1, Sec. \ref{['sec:stage_one']}), and then detached from the mesh to fit finer details (Stage 2, Sec. \ref{['sec:stage_two']}). The canonical Gaussians are sent into the Deformation & Colorization Module to transform them to the observation space, predict the non-rigid deformation, and compensate the color changes caused by motion (Sec. \ref{['sec:def_color']}). Finally, the Gaussians are rasterized to render the image. For backpropagation, we compute the Gaussian-Mesh Alignment Losses to regularize the deformed Gaussians to align with the mesh in the Detached Stage (Sec. \ref{['sec:reg']}). To prevent the incorrect regularization when a Gaussian moves outside the triangle, we use the proposed retraction and Walking-on-Mesh strategies to retract the Gaussian back within the triangle or update the new bound triangle in the first and second stages, respectively (Sec. \ref{['sec:walk']}).
  • Figure 3: Illustration of the Adhered-on-Mesh Gaussian. We first a) align the Gaussian center on the triangle by defining it based on barycentric coordinates. Then in b), we make the Gaussian flat, and c) fix the direction of the smallest scale $\mathbf{R}^{(0)}$ as the triangle normal $\mathbf{n}_{\mathbf{f}}$ to align the Gaussian orientation with the surface. Different from former fixed-on-mesh representation sugargomavatar, we simultaneously optimize the barycentric coordinates $\boldsymbol{a}$ and the mesh vertices $\mathbf{v}$, which allows Gaussians to flow on the mesh for higher flexibility while driving the mesh to fit the scene.
  • Figure 4: Illustration of the Gaussian-mesh alignment losses, which consists of a position and a normal alignment loss.
  • Figure 5: Illustration of the retraction strategy for out-of-triangle Gaussians optimization in the Adhered Stage.
  • ...and 12 more figures