RealityAvatar: Towards Realistic Loose Clothing Modeling in Animatable 3D Gaussian Avatars
Yahui Li, Zhi Zeng, Liming Pang, Guixuan Zhang, Shuwu Zhang
TL;DR
RealityAvatar tackles the challenge of accurately modeling dynamic loose clothing on animatable 3D human avatars. It introduces a 3D Gaussian Splatting framework augmented with a motion trend module (LSTM-based temporal modeling) and a latentbone encoder (region-based pose encoding with a clothing latent code) to capture pose-dependent deformations and temporal variations. The method maps canonical Gaussians to observation space via a skinning-based transformation and renders with a differentiable Gaussian pipeline that incorporates motion cues and per-frame lighting changes. Extensive experiments on I3D-Human and ZJU-Mocap show state-of-the-art novel-view and novel-pose performance, improved temporal coherence, and better handling of non-rigid clothing dynamics, with competitive training efficiency. Overall, RealityAvatar advances high-fidelity, temporally consistent digital humans with loose clothing for applications in virtual production and realtime avatars.
Abstract
Modeling animatable human avatars from monocular or multi-view videos has been widely studied, with recent approaches leveraging neural radiance fields (NeRFs) or 3D Gaussian Splatting (3DGS) achieving impressive results in novel-view and novel-pose synthesis. However, existing methods often struggle to accurately capture the dynamics of loose clothing, as they primarily rely on global pose conditioning or static per-frame representations, leading to oversmoothing and temporal inconsistencies in non-rigid regions. To address this, We propose RealityAvatar, an efficient framework for high-fidelity digital human modeling, specifically targeting loosely dressed avatars. Our method leverages 3D Gaussian Splatting to capture complex clothing deformations and motion dynamics while ensuring geometric consistency. By incorporating a motion trend module and a latentbone encoder, we explicitly model pose-dependent deformations and temporal variations in clothing behavior. Extensive experiments on benchmark datasets demonstrate the effectiveness of our approach in capturing fine-grained clothing deformations and motion-driven shape variations. Our method significantly enhances structural fidelity and perceptual quality in dynamic human reconstruction, particularly in non-rigid regions, while achieving better consistency across temporal frames.
