SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner
TL;DR
SceneTex introduces a diffusion-prior-based texture synthesis framework for indoor scenes that optimizes textures directly in RGB space. It uses a multiresolution texture field to capture details and a cross-attention decoder to enforce global style consistency across instances. Through depth-conditioned diffusion priors and a VSD-based objective, SceneTex achieves superior texture quality and prompt fidelity on 3D-FRONT datasets, outperforming previous methods both quantitatively and in user studies. While shading artifacts remain a limitation, the approach offers a scalable path to high-quality, style-controlled 3D scene texturing.
Abstract
We propose SceneTex, a novel method for effectively generating high-quality and style-consistent textures for indoor scenes using depth-to-image diffusion priors. Unlike previous methods that either iteratively warp 2D views onto a mesh surface or distillate diffusion latent features without accurate geometric and style cues, SceneTex formulates the texture synthesis task as an optimization problem in the RGB space where style and geometry consistency are properly reflected. At its core, SceneTex proposes a multiresolution texture field to implicitly encode the mesh appearance. We optimize the target texture via a score-distillation-based objective function in respective RGB renderings. To further secure the style consistency across views, we introduce a cross-attention decoder to predict the RGB values by cross-attending to the pre-sampled reference locations in each instance. SceneTex enables various and accurate texture synthesis for 3D-FRONT scenes, demonstrating significant improvements in visual quality and prompt fidelity over the prior texture generation methods.
