Table of Contents
Fetching ...

InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting

Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, Ziwei Liu

TL;DR

InteX tackles the challenge of 3D texture synthesis guided by text by introducing a unified depth-aware inpainting prior integrated with an iterative multi-view synthesis pipeline and an interactive GUI. The core idea is to train a diffusion-based inpainting model that jointly leverages depth information to produce depth-consistent textures on 3D surfaces, significantly reducing 3D inconsistencies and speeding up generation to about 30 seconds per instance. Key contributions include (i) a ControlNet-augmented six-channel prior trained on 3D datasets, (ii) a streamlined iterative text-to-texture pipeline with rendering, inpainting, and updating steps, and (iii) a GUI that enables region-specific erasing and repainting with prompt editing. The results demonstrate improved texture quality, stronger 3D coherence, and practical efficiency, enabling flexible, user-guided 3D texture creation across various SD checkpoints and prompts.

Abstract

Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models. Existing methods primarily adopt a combination of pretrained depth-aware diffusion and inpainting models, yet they exhibit shortcomings such as 3D inconsistency and limited controllability. To address these challenges, we introduce InteX, a novel framework for interactive text-to-texture synthesis. 1) InteX includes a user-friendly interface that facilitates interaction and control throughout the synthesis process, enabling region-specific repainting and precise texture editing. 2) Additionally, we develop a unified depth-aware inpainting model that integrates depth information with inpainting cues, effectively mitigating 3D inconsistencies and improving generation speed. Through extensive experiments, our framework has proven to be both practical and effective in text-to-texture synthesis, paving the way for high-quality 3D content creation.

InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting

TL;DR

InteX tackles the challenge of 3D texture synthesis guided by text by introducing a unified depth-aware inpainting prior integrated with an iterative multi-view synthesis pipeline and an interactive GUI. The core idea is to train a diffusion-based inpainting model that jointly leverages depth information to produce depth-consistent textures on 3D surfaces, significantly reducing 3D inconsistencies and speeding up generation to about 30 seconds per instance. Key contributions include (i) a ControlNet-augmented six-channel prior trained on 3D datasets, (ii) a streamlined iterative text-to-texture pipeline with rendering, inpainting, and updating steps, and (iii) a GUI that enables region-specific erasing and repainting with prompt editing. The results demonstrate improved texture quality, stronger 3D coherence, and practical efficiency, enabling flexible, user-guided 3D texture creation across various SD checkpoints and prompts.

Abstract

Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models. Existing methods primarily adopt a combination of pretrained depth-aware diffusion and inpainting models, yet they exhibit shortcomings such as 3D inconsistency and limited controllability. To address these challenges, we introduce InteX, a novel framework for interactive text-to-texture synthesis. 1) InteX includes a user-friendly interface that facilitates interaction and control throughout the synthesis process, enabling region-specific repainting and precise texture editing. 2) Additionally, we develop a unified depth-aware inpainting model that integrates depth information with inpainting cues, effectively mitigating 3D inconsistencies and improving generation speed. Through extensive experiments, our framework has proven to be both practical and effective in text-to-texture synthesis, paving the way for high-quality 3D content creation.
Paper Structure (29 sections, 4 equations, 11 figures, 1 table)

This paper contains 29 sections, 4 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: We propose InteX, an interactive text-to-texture synthesis framework via unified depth-aware inpainting. Our method supports flexible visualization, inpainting, erasing, and repainting with a graphic user interface.
  • Figure 2: Text-to-texture Synthesis Framework. We iteratively apply our unified depth-aware inpainting prior from multiple camera viewpoints and back-project the images to synthesize texture on the 3D surface.
  • Figure 3: Depth-aware inpainting results of the 2D prior. Our method produces best depth-aligned inpainting results, while inpainting-only may change the geometry and depth-only may produce inconsistent content.
  • Figure 4: Mipmap Bilinear Extrapolation. We randomly sample 10% pixels and put them back to an empty image using different extrapolation methods. The proposed mipmap bilinear method can fill most of the holes.
  • Figure 5: Qualitative comparison with recent methods. Our method generates textures with higher quality and better 3D consistency.
  • ...and 6 more figures