InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting
Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, Ziwei Liu
TL;DR
InteX tackles the challenge of 3D texture synthesis guided by text by introducing a unified depth-aware inpainting prior integrated with an iterative multi-view synthesis pipeline and an interactive GUI. The core idea is to train a diffusion-based inpainting model that jointly leverages depth information to produce depth-consistent textures on 3D surfaces, significantly reducing 3D inconsistencies and speeding up generation to about 30 seconds per instance. Key contributions include (i) a ControlNet-augmented six-channel prior trained on 3D datasets, (ii) a streamlined iterative text-to-texture pipeline with rendering, inpainting, and updating steps, and (iii) a GUI that enables region-specific erasing and repainting with prompt editing. The results demonstrate improved texture quality, stronger 3D coherence, and practical efficiency, enabling flexible, user-guided 3D texture creation across various SD checkpoints and prompts.
Abstract
Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models. Existing methods primarily adopt a combination of pretrained depth-aware diffusion and inpainting models, yet they exhibit shortcomings such as 3D inconsistency and limited controllability. To address these challenges, we introduce InteX, a novel framework for interactive text-to-texture synthesis. 1) InteX includes a user-friendly interface that facilitates interaction and control throughout the synthesis process, enabling region-specific repainting and precise texture editing. 2) Additionally, we develop a unified depth-aware inpainting model that integrates depth information with inpainting cues, effectively mitigating 3D inconsistencies and improving generation speed. Through extensive experiments, our framework has proven to be both practical and effective in text-to-texture synthesis, paving the way for high-quality 3D content creation.
