AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using Garment Rigging Model
Beijia Chen, Yuefan Shen, Qing Shuai, Xiaowei Zhou, Kun Zhou, Youyi Zheng
TL;DR
AniDress tackles the challenge of animating loose-dressed avatars from sparse views by introducing a PBS-derived garment rigging model and a pose-driven NeRF. The method jointly models body and garment dynamics, estimating temporally coherent garment poses from limited RGB data via differentiable rendering and 2D cues. A deformable NeRF conditioned on both body and garment poses enables high-quality rendering across novel views and poses, while test-time garment poses can be sourced from simulation or prediction to extend generalization. A new multi-view dataset of loose garments supports evaluation, and experiments demonstrate improved rendering quality and robust pose generalization over prior work, with the code and data to be released publicly.
Abstract
Recent communities have seen significant progress in building photo-realistic animatable avatars from sparse multi-view videos. However, current workflows struggle to render realistic garment dynamics for loose-fitting characters as they predominantly rely on naked body models for human modeling while leaving the garment part un-modeled. This is mainly due to that the deformations yielded by loose garments are highly non-rigid, and capturing such deformations often requires dense views as supervision. In this paper, we introduce AniDress, a novel method for generating animatable human avatars in loose clothes using very sparse multi-view videos (4-8 in our setting). To allow the capturing and appearance learning of loose garments in such a situation, we employ a virtual bone-based garment rigging model obtained from physics-based simulation data. Such a model allows us to capture and render complex garment dynamics through a set of low-dimensional bone transformations. Technically, we develop a novel method for estimating temporal coherent garment dynamics from a sparse multi-view video. To build a realistic rendering for unseen garment status using coarse estimations, a pose-driven deformable neural radiance field conditioned on both body and garment motions is introduced, providing explicit control of both parts. At test time, the new garment poses can be captured from unseen situations, derived from a physics-based or neural network-based simulator to drive unseen garment dynamics. To evaluate our approach, we create a multi-view dataset that captures loose-dressed performers with diverse motions. Experiments show that our method is able to render natural garment dynamics that deviate highly from the body and generalize well to both unseen views and poses, surpassing the performance of existing methods. The code and data will be publicly available.
