Learning High-Fidelity Cloth Animation via Skinning-Free Image Transfer
Rong Wang, Wei Mao, Changsheng Lu, Hongdong Li
TL;DR
The paper tackles the challenge of producing high-fidelity 3D garment deformations by avoiding the artifacts of linear blend skinning. It decouples low- and high-frequency garment information into vertex positions and normals, renders both as 2D texture images, and performs 2D image transfer to estimate pose-dependent shape and wrinkles. A multimodal fusion framework then refines the 3D mesh by combining information from both modalities, leveraging pretrained vision models for perceptual quality. Experiments on multiple datasets show improved deformation accuracy and wrinkle realism over skinning-based methods, with strong generalization to new garments and body shapes. This approach enables scalable, topologically diverse garment animation without manual UV partitioning or explicit skinning supervision.
Abstract
We present a novel method for generating 3D garment deformations from given body poses, which is key to a wide range of applications, including virtual try-on and extended reality. To simplify the cloth dynamics, existing methods mostly rely on linear blend skinning to obtain low-frequency posed garment shape and only regress high-frequency wrinkles. However, due to the lack of explicit skinning supervision, such skinning-based approach often produces misaligned shapes when posing the garment, consequently corrupts the high-frequency signals and fails to recover high-fidelity wrinkles. To tackle this issue, we propose a skinning-free approach by independently estimating posed (i) vertex position for low-frequency posed garment shape, and (ii) vertex normal for high-frequency local wrinkle details. In this way, each frequency modality can be effectively decoupled and directly supervised by the geometry of the deformed garment. To further improve the visual quality of animation, we propose to encode both vertex attributes as rendered texture images, so that 3D garment deformation can be equivalently achieved via 2D image transfer. This enables us to leverage powerful pretrained image models to recover fine-grained visual details in wrinkles, while maintaining superior scalability for garments of diverse topologies without relying on manual UV partition. Finally, we propose a multimodal fusion to incorporate constraints from both frequency modalities and robustly recover deformed 3D garments from transferred images. Extensive experiments show that our method significantly improves animation quality on various garment types and recovers finer wrinkles than state-of-the-art methods.
