Table of Contents
Fetching ...

GarmentPainter: Efficient 3D Garment Texture Synthesis with Character-Guided Diffusion Model

Jinbo Wu, Xiaobo Gao, Xing Liu, Chen Zhao, Jialun Liu

TL;DR

This work introduces GarmentPainter, a simple yet efficient framework for synthesizing high-quality, 3D-aware garment textures in UV space that achieves state-of-the-art performance in terms of visual fidelity, 3D consistency, and computational efficiency, outperforming existing methods in both qualitative and quantitative evaluations.

Abstract

Generating high-fidelity, 3D-consistent garment textures remains a challenging problem due to the inherent complexities of garment structures and the stringent requirement for detailed, globally consistent texture synthesis. Existing approaches either rely on 2D-based diffusion models, which inherently struggle with 3D consistency, require expensive multi-step optimization or depend on strict spatial alignment between 2D reference images and 3D meshes, which limits their flexibility and scalability. In this work, we introduce GarmentPainter, a simple yet efficient framework for synthesizing high-quality, 3D-aware garment textures in UV space. Our method leverages a UV position map as the 3D structural guidance, ensuring texture consistency across the garment surface during texture generation. To enhance control and adaptability, we introduce a type selection module, enabling fine-grained texture generation for specific garment components based on a character reference image, without requiring alignment between the reference image and the 3D mesh. GarmentPainter efficiently integrates all guidance signals into the input of a diffusion model in a spatially aligned manner, without modifying the underlying UNet architecture. Extensive experiments demonstrate that GarmentPainter achieves state-of-the-art performance in terms of visual fidelity, 3D consistency, and computational efficiency, outperforming existing methods in both qualitative and quantitative evaluations.

GarmentPainter: Efficient 3D Garment Texture Synthesis with Character-Guided Diffusion Model

TL;DR

This work introduces GarmentPainter, a simple yet efficient framework for synthesizing high-quality, 3D-aware garment textures in UV space that achieves state-of-the-art performance in terms of visual fidelity, 3D consistency, and computational efficiency, outperforming existing methods in both qualitative and quantitative evaluations.

Abstract

Generating high-fidelity, 3D-consistent garment textures remains a challenging problem due to the inherent complexities of garment structures and the stringent requirement for detailed, globally consistent texture synthesis. Existing approaches either rely on 2D-based diffusion models, which inherently struggle with 3D consistency, require expensive multi-step optimization or depend on strict spatial alignment between 2D reference images and 3D meshes, which limits their flexibility and scalability. In this work, we introduce GarmentPainter, a simple yet efficient framework for synthesizing high-quality, 3D-aware garment textures in UV space. Our method leverages a UV position map as the 3D structural guidance, ensuring texture consistency across the garment surface during texture generation. To enhance control and adaptability, we introduce a type selection module, enabling fine-grained texture generation for specific garment components based on a character reference image, without requiring alignment between the reference image and the 3D mesh. GarmentPainter efficiently integrates all guidance signals into the input of a diffusion model in a spatially aligned manner, without modifying the underlying UNet architecture. Extensive experiments demonstrate that GarmentPainter achieves state-of-the-art performance in terms of visual fidelity, 3D consistency, and computational efficiency, outperforming existing methods in both qualitative and quantitative evaluations.
Paper Structure (20 sections, 6 equations, 11 figures, 3 tables)

This paper contains 20 sections, 6 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The results of our method. Given garment type, we showcase three types of garment texture generation: top, bottom and one-piece. Each image consists of two sections: The left side displays the reference image alongside the UV position map, which is obtained through the rasterization of the corresponding mesh. The right side presents the rendered output of the generated texture, including both the front view and back view. Our method produces high-quality and 3D consistent textures across a wide range of garment styles.
  • Figure 2: We show three types of our data, i.e., one-piece, top and bottom. Specifically, we show reference image, UV texture map, UV position map and UV mask image.
  • Figure 3: The GarmentPainter framework starts by encoding the reference image and UV position map with a VAE kingma2013auto. At the same time, the masked UV texture map is encoded to preserve background regions that should not be regenerated. The reference latent captures style information, while the UV position latent captures structure. These latents, along with the UV texture map and reference image, are transformed into texture latents, then noised to form the noisy latent. The UV mask image is downsampled by 8× to restrict generation to the intended area. All latent features are concatenated and passed into the diffusion model. A type encoder also embeds garment type labels, which are added to the time embedding to provide fine-grained control over garment regions.Note: Each transformer block omits cross-attention for text interaction.
  • Figure 4: Attention maps for $z^{2} \rightarrow z^{1}$ and $z^{1} \rightarrow z^{2}$ demonstrate accurate and effective information exchange between the two diffusion noise predictors ${\epsilon_{\theta}^1, \epsilon_{\theta}^2}$
  • Figure 5: Qualitative comparison with SOTA methods. Our approach produces garment textures that closely match the reference images, even when characters are present. Compared to other methods, the results are more detailed and maintain better 3D consistency. Please zoom in to view details.
  • ...and 6 more figures