Table of Contents
Fetching ...

2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting

Qipeng Yan, Mingyang Sun, Lihua Zhang

TL;DR

This work introduces 2DGS-Avatar, a real-time, high-fidelity clothed avatar reconstruction method from monocular RGB videos using 2D Gaussian Splatting. It initializes 2D Gaussian primitives on a SMPL-X canonical surface, applies forward skinning to pose space, and renders with a differentiable 2DGS rasterizer, supervised by RGB and normal maps, complemented by self-supervised area regularization and eccentricity filtering to improve surface distribution and geometry. The approach bridges fast training and rendering with detailed clothing geometry, achieving competitive quantitative results against 3DGS-based methods while significantly reducing training time and memory usage, and enabling ~60 FPS rendering on consumer GPUs. Experiments on AvatarRex and THuman4.0 demonstrate strong qualitative and quantitative performance, with ablations confirming the effectiveness of each proposed component. The work highlights practical impact for AR/VR and dynamic character capture, while noting limitations in motion-induced wrinkles and underrepresented regions that warrant future garment-modeling enhancements.

Abstract

Real-time rendering of high-fidelity and animatable avatars from monocular videos remains a challenging problem in computer vision and graphics. Over the past few years, the Neural Radiance Field (NeRF) has made significant progress in rendering quality but behaves poorly in run-time performance due to the low efficiency of volumetric rendering. Recently, methods based on 3D Gaussian Splatting (3DGS) have shown great potential in fast training and real-time rendering. However, they still suffer from artifacts caused by inaccurate geometry. To address these problems, we propose 2DGS-Avatar, a novel approach based on 2D Gaussian Splatting (2DGS) for modeling animatable clothed avatars with high-fidelity and fast training performance. Given monocular RGB videos as input, our method generates an avatar that can be driven by poses and rendered in real-time. Compared to 3DGS-based methods, our 2DGS-Avatar retains the advantages of fast training and rendering while also capturing detailed, dynamic, and photo-realistic appearances. We conduct abundant experiments on popular datasets such as AvatarRex and THuman4.0, demonstrating impressive performance in both qualitative and quantitative metrics.

2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting

TL;DR

This work introduces 2DGS-Avatar, a real-time, high-fidelity clothed avatar reconstruction method from monocular RGB videos using 2D Gaussian Splatting. It initializes 2D Gaussian primitives on a SMPL-X canonical surface, applies forward skinning to pose space, and renders with a differentiable 2DGS rasterizer, supervised by RGB and normal maps, complemented by self-supervised area regularization and eccentricity filtering to improve surface distribution and geometry. The approach bridges fast training and rendering with detailed clothing geometry, achieving competitive quantitative results against 3DGS-based methods while significantly reducing training time and memory usage, and enabling ~60 FPS rendering on consumer GPUs. Experiments on AvatarRex and THuman4.0 demonstrate strong qualitative and quantitative performance, with ablations confirming the effectiveness of each proposed component. The work highlights practical impact for AR/VR and dynamic character capture, while noting limitations in motion-induced wrinkles and underrepresented regions that warrant future garment-modeling enhancements.

Abstract

Real-time rendering of high-fidelity and animatable avatars from monocular videos remains a challenging problem in computer vision and graphics. Over the past few years, the Neural Radiance Field (NeRF) has made significant progress in rendering quality but behaves poorly in run-time performance due to the low efficiency of volumetric rendering. Recently, methods based on 3D Gaussian Splatting (3DGS) have shown great potential in fast training and real-time rendering. However, they still suffer from artifacts caused by inaccurate geometry. To address these problems, we propose 2DGS-Avatar, a novel approach based on 2D Gaussian Splatting (2DGS) for modeling animatable clothed avatars with high-fidelity and fast training performance. Given monocular RGB videos as input, our method generates an avatar that can be driven by poses and rendered in real-time. Compared to 3DGS-based methods, our 2DGS-Avatar retains the advantages of fast training and rendering while also capturing detailed, dynamic, and photo-realistic appearances. We conduct abundant experiments on popular datasets such as AvatarRex and THuman4.0, demonstrating impressive performance in both qualitative and quantitative metrics.

Paper Structure

This paper contains 16 sections, 12 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of the pipeline. The orange arrows indicate backpropagation. Our method consists of two parts: (1) Transforming Gaussian primitives from the canonical space to the posed space through forward skinning, followed by rasterization to render images and depth maps in the posed space. (2) Optimizing the Gaussian primitives in the canonical space using photometric loss, normal loss, and self-supervised loss.
  • Figure 2: Qualitative comparison on AvatarRexzheng2023avatarrex. We show the results for both novel view and novel pose on sequences of "avatarrex_zzr" and "avatarrex_lbn2" in AvatarRex. Our method reaches comparable visual effects to Animatable Gaussians hu2024gaussianavatar while surpassing GauHuman hu2024gauhuman in terms of surface details, such as hands, clothes and shoes.
  • Figure 3: More results on sequences of "subject00" and "subject02" in THuman4.0zheng2022structured with novel pose.
  • Figure 4: The visualization of the ablation study on $L_{area}$. With $L_{area}$, the Gaussian primitives are prone to converge towards a more uniform distribution around the surface.