Table of Contents
Fetching ...

AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using Garment Rigging Model

Beijia Chen, Yuefan Shen, Qing Shuai, Xiaowei Zhou, Kun Zhou, Youyi Zheng

TL;DR

AniDress tackles the challenge of animating loose-dressed avatars from sparse views by introducing a PBS-derived garment rigging model and a pose-driven NeRF. The method jointly models body and garment dynamics, estimating temporally coherent garment poses from limited RGB data via differentiable rendering and 2D cues. A deformable NeRF conditioned on both body and garment poses enables high-quality rendering across novel views and poses, while test-time garment poses can be sourced from simulation or prediction to extend generalization. A new multi-view dataset of loose garments supports evaluation, and experiments demonstrate improved rendering quality and robust pose generalization over prior work, with the code and data to be released publicly.

Abstract

Recent communities have seen significant progress in building photo-realistic animatable avatars from sparse multi-view videos. However, current workflows struggle to render realistic garment dynamics for loose-fitting characters as they predominantly rely on naked body models for human modeling while leaving the garment part un-modeled. This is mainly due to that the deformations yielded by loose garments are highly non-rigid, and capturing such deformations often requires dense views as supervision. In this paper, we introduce AniDress, a novel method for generating animatable human avatars in loose clothes using very sparse multi-view videos (4-8 in our setting). To allow the capturing and appearance learning of loose garments in such a situation, we employ a virtual bone-based garment rigging model obtained from physics-based simulation data. Such a model allows us to capture and render complex garment dynamics through a set of low-dimensional bone transformations. Technically, we develop a novel method for estimating temporal coherent garment dynamics from a sparse multi-view video. To build a realistic rendering for unseen garment status using coarse estimations, a pose-driven deformable neural radiance field conditioned on both body and garment motions is introduced, providing explicit control of both parts. At test time, the new garment poses can be captured from unseen situations, derived from a physics-based or neural network-based simulator to drive unseen garment dynamics. To evaluate our approach, we create a multi-view dataset that captures loose-dressed performers with diverse motions. Experiments show that our method is able to render natural garment dynamics that deviate highly from the body and generalize well to both unseen views and poses, surpassing the performance of existing methods. The code and data will be publicly available.

AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using Garment Rigging Model

TL;DR

AniDress tackles the challenge of animating loose-dressed avatars from sparse views by introducing a PBS-derived garment rigging model and a pose-driven NeRF. The method jointly models body and garment dynamics, estimating temporally coherent garment poses from limited RGB data via differentiable rendering and 2D cues. A deformable NeRF conditioned on both body and garment poses enables high-quality rendering across novel views and poses, while test-time garment poses can be sourced from simulation or prediction to extend generalization. A new multi-view dataset of loose garments supports evaluation, and experiments demonstrate improved rendering quality and robust pose generalization over prior work, with the code and data to be released publicly.

Abstract

Recent communities have seen significant progress in building photo-realistic animatable avatars from sparse multi-view videos. However, current workflows struggle to render realistic garment dynamics for loose-fitting characters as they predominantly rely on naked body models for human modeling while leaving the garment part un-modeled. This is mainly due to that the deformations yielded by loose garments are highly non-rigid, and capturing such deformations often requires dense views as supervision. In this paper, we introduce AniDress, a novel method for generating animatable human avatars in loose clothes using very sparse multi-view videos (4-8 in our setting). To allow the capturing and appearance learning of loose garments in such a situation, we employ a virtual bone-based garment rigging model obtained from physics-based simulation data. Such a model allows us to capture and render complex garment dynamics through a set of low-dimensional bone transformations. Technically, we develop a novel method for estimating temporal coherent garment dynamics from a sparse multi-view video. To build a realistic rendering for unseen garment status using coarse estimations, a pose-driven deformable neural radiance field conditioned on both body and garment motions is introduced, providing explicit control of both parts. At test time, the new garment poses can be captured from unseen situations, derived from a physics-based or neural network-based simulator to drive unseen garment dynamics. To evaluate our approach, we create a multi-view dataset that captures loose-dressed performers with diverse motions. Experiments show that our method is able to render natural garment dynamics that deviate highly from the body and generalize well to both unseen views and poses, surpassing the performance of existing methods. The code and data will be publicly available.
Paper Structure (23 sections, 9 equations, 11 figures, 8 tables)

This paper contains 23 sections, 9 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Given a sparse multi-view video with a loose-dressed performer, we estimate both body and garment poses aided by a garment rigging model. Then, a pose-driven neural radiance field is optimized to fit the video. At test time, our method can synthesize plausible body and garment motions from novel views. In this case, we use body motions from AMASS AMASS:2019 and garment poses from physics-based simulation for novel pose synthesis.
  • Figure 2: Overview of the procedures for building garment rigging model and capturing garment poses from a multi-view video. Starting from a template mesh $\mathcal{M}$, we run the physics-based simulation to generate diverse garment shapes $\{\mathcal{M}^p\}$, from which a garment LBS modeling is extracted via skinning decomposition. In the fitting step, we use the garment masks $I^M$, image normals $I^N$, and optical flows $I^{of}$ to estimate the garment poses at each frame.
  • Figure 3: Overview of our rendering pipeline. For each sampled point in observation space, we first transform it back to the canonical space using a pose-driven deformation module conditioned on both body $\mathbf{B}$ and garment poses $\mathbf{G}$ and then query its color and density $(\mathbf{c}, \sigma)$ through a radiance field $\mathcal{F}_c$ defined in the canonical space.
  • Figure 4: Visualizations of our garment fitting results, where the 1$^{\text{st}}$ and the 3$^{\text{rd}}$ rows show the inputs and rendered garment geometry, respectively. We also overlay the rendered mesh geometry onto input images to illustrate the alignments (the 2$^{\text{nd}}$ row).
  • Figure 5: Visual comparisons on novel view synthesis. We also show an extra view of our results denoted as Ours$'$ and the GT$'$.
  • ...and 6 more figures