A Generative Multi-Resolution Pyramid and Normal-Conditioning 3D Cloth Draping
Hunor Laczkó, Meysam Madadi, Sergio Escalera, Jordi Gonzalez
TL;DR
This work addresses 3D garment generation and draping by introducing a pyramid conditional variational autoencoder that operates in a canonical pose space and uses UV map representations. It conditions fabric generation on garment templates, posed body UVs, and surface normals, with a dedicated normal-encoder to enable sampling. The pyramid architecture progressively adds low- to high-frequency details across multiple resolutions, yielding state-of-the-art results on CLOTH3D and CAPE while maintaining generalization to unseen garments and poses even with limited data. The approach achieves fast inference and demonstrates robust handling of detail, texture, and geometric consistency, offering a practical solution for generative 3D cloth draping and virtual try-on applications.
Abstract
RGB cloth generation has been deeply studied in the related literature, however, 3D garment generation remains an open problem. In this paper, we build a conditional variational autoencoder for 3D garment generation and draping. We propose a pyramid network to add garment details progressively in a canonical space, i.e. unposing and unshaping the garments w.r.t. the body. We study conditioning the network on surface normal UV maps, as an intermediate representation, which is an easier problem to optimize than 3D coordinates. Our results on two public datasets, CLOTH3D and CAPE, show that our model is robust, controllable in terms of detail generation by the use of multi-resolution pyramids, and achieves state-of-the-art results that can highly generalize to unseen garments, poses, and shapes even when training with small amounts of data.
