Table of Contents
Fetching ...

DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation

Raquel Vidaurre, Elena Garces, Dan Casas

TL;DR

This paper addresses realistic 3D garment animation conditioned on body pose, shape, and design by introducing a diffusion-based model operating on 2D UV textures. The key idea is to represent 3D garment deformations as a UV displacement map $\mathbf{y}$ encoded by a parametric template via $T_g = G_{\text{design}}(\mathbf{p}) + \phi(G_{\text{wrinkles}}(\beta, \uptheta, \mathbf{p}))$, and to learn $p(\mathbf{y}|\mathbf{c})$ with a conditional diffusion model where $\mathbf{c}=[\beta, \uptheta, \mathbf{p}]$; temporal coherence is achieved by cascading diffusion with conditioning on the previous frame's UV state. The method demonstrates both static and temporally coherent dynamic wrinkles, trained on a dataset of 17 garment designs and 52 sequences, and shows competitive qualitative and quantitative performance against prior approaches while remaining agnostic to mesh topology. Limitations include body-garment collisions and potential over-smoothing with more data, which could be mitigated by latent diffusion and longer temporal modeling. Overall, DiffusedWrinkles enables realistic, controllable garment animation across designs without per-design regressors, with potential impact on interactive avatar rendering and gaming pipelines.

Abstract

We present a data-driven method for learning to generate animations of 3D garments using a 2D image diffusion model. In contrast to existing methods, typically based on fully connected networks, graph neural networks, or generative adversarial networks, which have difficulties to cope with parametric garments with fine wrinkle detail, our approach is able to synthesize high-quality 3D animations for a wide variety of garments and body shapes, while being agnostic to the garment mesh topology. Our key idea is to represent 3D garment deformations as a 2D layout-consistent texture that encodes 3D offsets with respect to a parametric garment template. Using this representation, we encode a large dataset of garments simulated in various motions and shapes and train a novel conditional diffusion model that is able to synthesize high-quality pose-shape-and-design dependent 3D garment deformations. Since our model is generative, we can synthesize various plausible deformations for a given target pose, shape, and design. Additionally, we show that we can further condition our model using an existing garment state, which enables the generation of temporally coherent sequences.

DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation

TL;DR

This paper addresses realistic 3D garment animation conditioned on body pose, shape, and design by introducing a diffusion-based model operating on 2D UV textures. The key idea is to represent 3D garment deformations as a UV displacement map encoded by a parametric template via , and to learn with a conditional diffusion model where ; temporal coherence is achieved by cascading diffusion with conditioning on the previous frame's UV state. The method demonstrates both static and temporally coherent dynamic wrinkles, trained on a dataset of 17 garment designs and 52 sequences, and shows competitive qualitative and quantitative performance against prior approaches while remaining agnostic to mesh topology. Limitations include body-garment collisions and potential over-smoothing with more data, which could be mitigated by latent diffusion and longer temporal modeling. Overall, DiffusedWrinkles enables realistic, controllable garment animation across designs without per-design regressors, with potential impact on interactive avatar rendering and gaming pipelines.

Abstract

We present a data-driven method for learning to generate animations of 3D garments using a 2D image diffusion model. In contrast to existing methods, typically based on fully connected networks, graph neural networks, or generative adversarial networks, which have difficulties to cope with parametric garments with fine wrinkle detail, our approach is able to synthesize high-quality 3D animations for a wide variety of garments and body shapes, while being agnostic to the garment mesh topology. Our key idea is to represent 3D garment deformations as a 2D layout-consistent texture that encodes 3D offsets with respect to a parametric garment template. Using this representation, we encode a large dataset of garments simulated in various motions and shapes and train a novel conditional diffusion model that is able to synthesize high-quality pose-shape-and-design dependent 3D garment deformations. Since our model is generative, we can synthesize various plausible deformations for a given target pose, shape, and design. Additionally, we show that we can further condition our model using an existing garment state, which enables the generation of temporally coherent sequences.

Paper Structure

This paper contains 14 sections, 4 equations, 9 figures.

Figures (9)

  • Figure 1: Three garment designs 3D garment deformation model.
  • Figure 2: We use a UNet ho2020denoising architecture with six Resnet blocks. The conditioning vector, aggregated in the ResNet blocks, contains the pose, shape, and design parameters.
  • Figure 3: Temporal coherent diffusion model. To account for temporal consistency in the generated sequences while varying pose parameter, we concatenate the output of the previous frame in the sequence.
  • Figure 4: Dataset samples. We simulate garments on different bodies (top), which we convert into a UV image (bottom) that faithfully reconstruct the original garment (middle).
  • Figure 5: Quantitative evaluation of our temporally-coherent diffusion model (in red) and per-frame diffusion model (in blue), in two test sequences. Since our temporal model is conditioned on the previous deformation state of the garment, the resulting animations are temporally smooth and closer to the ground truth surface.
  • ...and 4 more figures