Table of Contents
Fetching ...

Learning High-Fidelity Cloth Animation via Skinning-Free Image Transfer

Rong Wang, Wei Mao, Changsheng Lu, Hongdong Li

TL;DR

The paper tackles the challenge of producing high-fidelity 3D garment deformations by avoiding the artifacts of linear blend skinning. It decouples low- and high-frequency garment information into vertex positions and normals, renders both as 2D texture images, and performs 2D image transfer to estimate pose-dependent shape and wrinkles. A multimodal fusion framework then refines the 3D mesh by combining information from both modalities, leveraging pretrained vision models for perceptual quality. Experiments on multiple datasets show improved deformation accuracy and wrinkle realism over skinning-based methods, with strong generalization to new garments and body shapes. This approach enables scalable, topologically diverse garment animation without manual UV partitioning or explicit skinning supervision.

Abstract

We present a novel method for generating 3D garment deformations from given body poses, which is key to a wide range of applications, including virtual try-on and extended reality. To simplify the cloth dynamics, existing methods mostly rely on linear blend skinning to obtain low-frequency posed garment shape and only regress high-frequency wrinkles. However, due to the lack of explicit skinning supervision, such skinning-based approach often produces misaligned shapes when posing the garment, consequently corrupts the high-frequency signals and fails to recover high-fidelity wrinkles. To tackle this issue, we propose a skinning-free approach by independently estimating posed (i) vertex position for low-frequency posed garment shape, and (ii) vertex normal for high-frequency local wrinkle details. In this way, each frequency modality can be effectively decoupled and directly supervised by the geometry of the deformed garment. To further improve the visual quality of animation, we propose to encode both vertex attributes as rendered texture images, so that 3D garment deformation can be equivalently achieved via 2D image transfer. This enables us to leverage powerful pretrained image models to recover fine-grained visual details in wrinkles, while maintaining superior scalability for garments of diverse topologies without relying on manual UV partition. Finally, we propose a multimodal fusion to incorporate constraints from both frequency modalities and robustly recover deformed 3D garments from transferred images. Extensive experiments show that our method significantly improves animation quality on various garment types and recovers finer wrinkles than state-of-the-art methods.

Learning High-Fidelity Cloth Animation via Skinning-Free Image Transfer

TL;DR

The paper tackles the challenge of producing high-fidelity 3D garment deformations by avoiding the artifacts of linear blend skinning. It decouples low- and high-frequency garment information into vertex positions and normals, renders both as 2D texture images, and performs 2D image transfer to estimate pose-dependent shape and wrinkles. A multimodal fusion framework then refines the 3D mesh by combining information from both modalities, leveraging pretrained vision models for perceptual quality. Experiments on multiple datasets show improved deformation accuracy and wrinkle realism over skinning-based methods, with strong generalization to new garments and body shapes. This approach enables scalable, topologically diverse garment animation without manual UV partitioning or explicit skinning supervision.

Abstract

We present a novel method for generating 3D garment deformations from given body poses, which is key to a wide range of applications, including virtual try-on and extended reality. To simplify the cloth dynamics, existing methods mostly rely on linear blend skinning to obtain low-frequency posed garment shape and only regress high-frequency wrinkles. However, due to the lack of explicit skinning supervision, such skinning-based approach often produces misaligned shapes when posing the garment, consequently corrupts the high-frequency signals and fails to recover high-fidelity wrinkles. To tackle this issue, we propose a skinning-free approach by independently estimating posed (i) vertex position for low-frequency posed garment shape, and (ii) vertex normal for high-frequency local wrinkle details. In this way, each frequency modality can be effectively decoupled and directly supervised by the geometry of the deformed garment. To further improve the visual quality of animation, we propose to encode both vertex attributes as rendered texture images, so that 3D garment deformation can be equivalently achieved via 2D image transfer. This enables us to leverage powerful pretrained image models to recover fine-grained visual details in wrinkles, while maintaining superior scalability for garments of diverse topologies without relying on manual UV partition. Finally, we propose a multimodal fusion to incorporate constraints from both frequency modalities and robustly recover deformed 3D garments from transferred images. Extensive experiments show that our method significantly improves animation quality on various garment types and recovers finer wrinkles than state-of-the-art methods.

Paper Structure

This paper contains 12 sections, 8 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of our method. Given input garment and body meshes (a), previous work patel2020tailornet relies on LBS to generate low-frequency (LF) posed garment shape. However, inaccurate skinning in LBS can produce artifacts and misaligned garment position (b), which corrupts high-frequency (HF) signals and hinders the wrinkle regression (c). In contrast, we decompose frequency modalities using two geometric attributes: vertex positions and normals, which are rendered as 2D texture images (d) and then transferred on pixel intensities (e) to represent garment deformation. After fusing from both modalities, we generate deformed garment with more accurate wrinkles (f).
  • Figure 2: Overview of our method. Given the input garment template $\bar{\textbf{M}}_g$ and posed body mesh $\textbf{M}_b$, we first render position and normal images for the garment {$\bar{\mathcal{P}}_g^s$, $\bar{\mathcal{N}}_g^s$} and body {${\mathcal{P}}_b^s$, ${\mathcal{N}}_b^s$} from each view $s$, aiming to project the 3D garment onto the image space. Next, we transfer position images in $f_p(\cdot)$ and normal images in $f_n(\cdot)$, where the two networks have the same architecture as shown in the top row (taking front normal images as an example). Finally, we initialize the posed garment mesh from transferred position images $\hat{\mathcal{P}}_g^s$ and recover missing wrinkle details by fusing from normal images $\hat{\mathcal{N}}_g^s$ to obtain the deformed garment $\hat{\textbf{M}}_g$. "$\oplus$" denotes residual connection.
  • Figure 3: Results on VTO dataset. We produce more accurate wrinkles and folds than skinning-based methods patel2020tailornetsantesteban2021self.
  • Figure 4: Results on TailorNet dataset. Our method consistently produces more accurate deformations on lower garments than patel2020tailornetgrigorev2023hood.
  • Figure 5: Qualitative results with human motions. Our method generates accurate and plausible garment deformations for a sequence of unseen human poses. Moreover, the deformed garments are temporally consistent. We show more results in the supplementary video.
  • ...and 3 more figures