Table of Contents
Fetching ...

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

TL;DR

Paint-it addresses the challenge of text-guided, high-fidelity PBR texture synthesis for 3D meshes by introducing DC-PBR, a neural re-parameterization of texture maps that outputs K^d, K^rm, and K^n from a fixed latent input. It leverages Score-Distillation Sampling (SDS) to guide optimization, but observes that SDS gradients are noisy; DC-PBR inherently imposes a frequency-aware curriculum that emphasizes low-frequency content first, filtering noisy high-frequency signals. The method demonstrates strong qualitative and quantitative performance across diverse meshes, supports test-time relighting and material control in graphics engines, and shows clear ablations validating the importance of the PBR texture representation and neural re-parameterization. The approach offers a practical pathway to scalable, text-driven production of photorealistic textured 3D assets, with potential extensions to animation and large-scale scenes. $L_{SDS}$ guidance and the DC-PBR prior together enable robust, coherent texture synthesis from natural language prompts within feasible runtimes on modern GPUs.

Abstract

We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: https://kim-youwang.github.io/paint-it

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

TL;DR

Paint-it addresses the challenge of text-guided, high-fidelity PBR texture synthesis for 3D meshes by introducing DC-PBR, a neural re-parameterization of texture maps that outputs K^d, K^rm, and K^n from a fixed latent input. It leverages Score-Distillation Sampling (SDS) to guide optimization, but observes that SDS gradients are noisy; DC-PBR inherently imposes a frequency-aware curriculum that emphasizes low-frequency content first, filtering noisy high-frequency signals. The method demonstrates strong qualitative and quantitative performance across diverse meshes, supports test-time relighting and material control in graphics engines, and shows clear ablations validating the importance of the PBR texture representation and neural re-parameterization. The approach offers a practical pathway to scalable, text-driven production of photorealistic textured 3D assets, with potential extensions to animation and large-scale scenes. guidance and the DC-PBR prior together enable robust, coherent texture synthesis from natural language prompts within feasible runtimes on modern GPUs.

Abstract

We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: https://kim-youwang.github.io/paint-it
Paper Structure (43 sections, 6 equations, 18 figures, 2 tables)

This paper contains 43 sections, 6 equations, 18 figures, 2 tables.

Figures (18)

  • Figure 1: $\emph{Paint-it}$. Given an untextured 3D mesh and the text description describing the desired appearance of the 3D mesh, $\emph{Paint-it}$ automatically synthesizes high-fidelity physically-based rendering (PBR) texture maps by neural re-parameterized texture map optimization.
  • Figure 2: $\emph{Paint-it}$: Practical applications. Using the synthesized PBR texture maps of $\emph{Paint-it}$ and commercial graphics engines, e.g., Blender, we can (1) relight the mesh by changing High-Dynamic Range (HDR) environmental lighting (see the balls) and (2) control the material properties at test-time. We can also simulate diverse appearance by synthesizing different PBR texture maps for the same mesh.
  • Figure 3: $\emph{Paint-it}$: Overall pipeline. Given a 3D object mesh without a texture and a text describing the desired appearance of the mesh, $\emph{Paint-it}$ synthesizes realistic PBR texture maps via synthesis-through-optimization. We introduce DC-PBR, which parameterizes the PBR texture map into randomly initialized U-Net convolutional neural kernels. By performing texture mapping to texturize the given mesh, we differentiably rasterize the textured mesh and obtain multi-view images, then compute the diffusion-guided loss. Note that $\emph{Paint-it}$ optimizes the neural parameters of the U-Net rather than directly optimizing the pixel values of the texture map.
  • Figure 4: Frequency scheduling of neural re-parameterized texture optimization. For each iteration, we investigate the energies of the frequency components of the reconstructed (a,b) / synthesized (c,d) texture maps. The pixel optimization (a,c) fits and increases all frequency bands and suffers from fitting high-frequency texture contents from the initial stages, yielding degraded quality texture maps. In contrast, our proposed neural re-parameterization (b,d) naturally schedules which frequencies to focus on, thus obtaining coarse-to-fine texture synthesis with robustness to noisy supervision, e.g., SDS loss, and yielding high-quality texture maps.
  • Figure 5: Qualitative results. We take diverse 3D meshes from Objaverse objaverse, RenderPeople renderpeople, and SMAL zuffi2017smal, then synthesize texture maps with our manual text prompts. We visualize the original and rendered meshes with our synthesized PBR texture maps. $\emph{Paint-it}$ can model diverse material properties, e.g., the metallic surface of a crown, the rough surface of a mushroom, realistic human skin tones, front-to-back appearance consistency, and complicated patterns of the animal's appearance. See supplementary material for more results.
  • ...and 13 more figures