Table of Contents
Fetching ...

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu

TL;DR

Through this coarse-to-fine process, Paint3D can produce high-quality 2K UV textures that maintain se-mantic consistency while being lighting-less, significantly advancing the state-of-the-art in texturing 3D objects.

Abstract

This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within modern graphics pipelines. To achieve this, our method first leverages a pre-trained depth-aware 2D diffusion model to generate view-conditional images and perform multi-view texture fusion, producing an initial coarse texture map. However, as 2D models cannot fully represent 3D shapes and disable lighting effects, the coarse texture map exhibits incomplete areas and illumination artifacts. To resolve this, we train separate UV Inpainting and UVHD diffusion models specialized for the shape-aware refinement of incomplete areas and the removal of illumination artifacts. Through this coarse-to-fine process, Paint3D can produce high-quality 2K UV textures that maintain semantic consistency while being lighting-less, significantly advancing the state-of-the-art in texturing 3D objects.

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

TL;DR

Through this coarse-to-fine process, Paint3D can produce high-quality 2K UV textures that maintain se-mantic consistency while being lighting-less, significantly advancing the state-of-the-art in texturing 3D objects.

Abstract

This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within modern graphics pipelines. To achieve this, our method first leverages a pre-trained depth-aware 2D diffusion model to generate view-conditional images and perform multi-view texture fusion, producing an initial coarse texture map. However, as 2D models cannot fully represent 3D shapes and disable lighting effects, the coarse texture map exhibits incomplete areas and illumination artifacts. To resolve this, we train separate UV Inpainting and UVHD diffusion models specialized for the shape-aware refinement of incomplete areas and the removal of illumination artifacts. Through this coarse-to-fine process, Paint3D can produce high-quality 2K UV textures that maintain semantic consistency while being lighting-less, significantly advancing the state-of-the-art in texturing 3D objects.
Paper Structure (1 figure, 2 tables)

This paper contains 1 figure, 2 tables.

Figures (1)

  • Figure 12: Visualization of the t-SNE results on evolved latent codes $\hat{z}_t$ during the reverse diffusion process (inference) on action-to-motion task. $t$ is the diffusion step but ordered in the forward diffusion trajectory. $\hat{z}_{t=49}$ is the initial random noise. $\hat{z}_{t=0}$ is our prediction. We sample 30 motions for each action label.