Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll
TL;DR
Paint-it addresses the challenge of text-guided, high-fidelity PBR texture synthesis for 3D meshes by introducing DC-PBR, a neural re-parameterization of texture maps that outputs K^d, K^rm, and K^n from a fixed latent input. It leverages Score-Distillation Sampling (SDS) to guide optimization, but observes that SDS gradients are noisy; DC-PBR inherently imposes a frequency-aware curriculum that emphasizes low-frequency content first, filtering noisy high-frequency signals. The method demonstrates strong qualitative and quantitative performance across diverse meshes, supports test-time relighting and material control in graphics engines, and shows clear ablations validating the importance of the PBR texture representation and neural re-parameterization. The approach offers a practical pathway to scalable, text-driven production of photorealistic textured 3D assets, with potential extensions to animation and large-scale scenes. $L_{SDS}$ guidance and the DC-PBR prior together enable robust, coherent texture synthesis from natural language prompts within feasible runtimes on modern GPUs.
Abstract
We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: https://kim-youwang.github.io/paint-it
