Table of Contents
Fetching ...

CTGAN: Semantic-guided Conditional Texture Generator for 3D Shapes

Yi-Ting Pan, Chai-Rong Lee, Shu-Ho Fan, Jheng-Wei Su, Jia-Bin Huang, Yung-Yu Chuang, Hung-Kuo Chu

TL;DR

CTGAN addresses the challenge of producing high-fidelity, view-consistent textures for 3D shapes by conditioning texture generation on semantic segmentation maps and reference style images. It leverages StyleGAN2-ADA with a disentangled latent space split into structure and style components, learned via two encoders, and enforces semantic alignment through a coarse-to-fine structure encoder. A canonical-view texture atlas parameterizes textures across multiple views, and a three-stage training regime with L2, LPIPS, and MOCO losses yields state-of-the-art results on ShapeNet cars and FFHQ faces in both conditional and unconditional settings. The work enables controllable, semantically guided texture synthesis for 3D objects, with practical impact for rapid, consistent texturing in games, films, and AR/VR pipelines, while noting limitations in handling occlusion and seams at view boundaries.

Abstract

The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guided Conditional Texture Generator (CTGAN), producing high-quality textures for 3D shapes that are consistent with the viewing angle while respecting shape semantics. CTGAN utilizes the disentangled nature of StyleGAN to finely manipulate the input latent codes, enabling explicit control over both the style and structure of the generated textures. A coarse-to-fine encoder architecture is introduced to enhance control over the structure of the resulting textures via input segmentation. Experimental results show that CTGAN outperforms existing methods on multiple quality metrics and achieves state-of-the-art performance on texture generation in both conditional and unconditional settings.

CTGAN: Semantic-guided Conditional Texture Generator for 3D Shapes

TL;DR

CTGAN addresses the challenge of producing high-fidelity, view-consistent textures for 3D shapes by conditioning texture generation on semantic segmentation maps and reference style images. It leverages StyleGAN2-ADA with a disentangled latent space split into structure and style components, learned via two encoders, and enforces semantic alignment through a coarse-to-fine structure encoder. A canonical-view texture atlas parameterizes textures across multiple views, and a three-stage training regime with L2, LPIPS, and MOCO losses yields state-of-the-art results on ShapeNet cars and FFHQ faces in both conditional and unconditional settings. The work enables controllable, semantically guided texture synthesis for 3D objects, with practical impact for rapid, consistent texturing in games, films, and AR/VR pipelines, while noting limitations in handling occlusion and seams at view boundaries.

Abstract

The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guided Conditional Texture Generator (CTGAN), producing high-quality textures for 3D shapes that are consistent with the viewing angle while respecting shape semantics. CTGAN utilizes the disentangled nature of StyleGAN to finely manipulate the input latent codes, enabling explicit control over both the style and structure of the generated textures. A coarse-to-fine encoder architecture is introduced to enhance control over the structure of the resulting textures via input segmentation. Experimental results show that CTGAN outperforms existing methods on multiple quality metrics and achieves state-of-the-art performance on texture generation in both conditional and unconditional settings.
Paper Structure (23 sections, 4 equations, 7 figures, 2 tables)

This paper contains 23 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Limitation of existing methods. 3D model texture generation by existing methods suffer from producing unrealistic and blurry results (Texture Fields OechsleICCV2019), and generating view-inconsistent texture (LTG rui2021LTG).
  • Figure 2: System overview. Given 3D model $M$ as input, we start with texture parameterization to generate the corresponding UV maps $\mathbb{U}$ and the segmentation maps $\mathbb{S}$. The texture generator $G$ then takes style code $\mathrm{w}^{full}$ as input and generates the texture maps $\mathbb{T}$ based on the segmentation maps $\mathbb{S}$. To ensure view-consistent results, we divide the style code $\mathrm{w}^{full}$ and separately encode segmentation maps $\mathbb{S}$ and style image $I$ into the structure representation $\mathrm{w}^{struct}$ and the style representation $\mathrm{w}^{sty}$ using structure encoder $E_{struct}$ and style encoder $E_{sty}$. Finally, we apply our generated texture maps $\mathbb{T}$ on the 3D model $M$ and produce the 3D textured model $R$.
  • Figure 3: (a) Coarse-to-fine structure encoder. Our coarse-to-fine structure encoder encodes segmentation maps $s_i$ into structure latent code $\mathrm{w}^{struct}$ gradually from low (coarse) to high (fine) resolution. (b) We start by training the texture generator $G$, then the style encoder $E_{sty}$, and finally the structure encoder $E_{struct}$.
  • Figure 4: Qualitative comparison on texture generation. First row: the input style images (only for the conditional part) and the input 3D models. Bottom 3 rows: the generated 3D textured models for each method using the input data from the first row. Our method produces superior results in generating texture maps that are more similar to style images and more view-consistent.
  • Figure 5: Qualitative comparison of different structure.
  • ...and 2 more figures