MD-ProjTex: Texturing 3D Shapes with Multi-Diffusion Projection
Ahmet Burak Yildirim, Mustafa Utku Aydogdu, Duygu Ceylan, Aysegul Dundar
TL;DR
MD-ProjTex tackles the challenge of text-guided texture generation for arbitrary 3D shapes without model training or run-time optimization. It introduces a UV-space multi-diffusion framework that fuses per-view denoising directions across multiple viewpoints, enabling parallel texture generation with strong multi-view consistency. Key innovations include encoder–decoder–denoising with Modified Denoising Steps, multi-scale texture generation, normal-guided weighting, camera-view selection via K-Means, and simple post-processing, all operating without training a new model. Empirically, the method achieves superior FID/KID scores and faster runtimes than state-of-the-art baselines, with user studies confirming perceptual preferences for the generated textures, making it practical for fast, high-quality 3D texture synthesis.
Abstract
We introduce MD-ProjTex, a method for fast and consistent text-guided texture generation for 3D shapes using pretrained text-to-image diffusion models. At the core of our approach is a multi-view consistency mechanism in UV space, which ensures coherent textures across different viewpoints. Specifically, MD-ProjTex fuses noise predictions from multiple views at each diffusion step and jointly updates the per-view denoising directions to maintain 3D consistency. In contrast to existing state-of-the-art methods that rely on optimization or sequential view synthesis, MD-ProjTex is computationally more efficient and achieves better quantitative and qualitative results.
