Table of Contents
Fetching ...

RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting

Qi Wang, Ruijie Lu, Xudong Xu, Jingbo Wang, Michael Yu Wang, Bo Dai, Gang Zeng, Dan Xu

TL;DR

RoomTex introduces a coarse-to-fine, text-driven indoor scene texturing pipeline that first creates a panoramic depth-guided reference ($D_p$, $I_p$) and then iteratively textures each object from multiple viewpoints, guided by depth maps and ControlNet. It addresses occlusions and geometry imperfections through empty-room refinement, depth-aware inpainting, and a misalignment removal step leveraging RGB and depth edge cues. The method supports interactive fine-grained texture control and scene editing by blending panorama guidance with view-wise inpainting, enabling coherent textures across compositional rooms. Experiments show high-fidelity, style-consistent textures with improved 3D consistency and editing flexibility, though the approach acknowledges limitations in fully covering all views in a single run and points to multi-view diffusion as future work.

Abstract

The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex, to generate high-fidelity and style-consistent textures for untextured compositional scene meshes. In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency. In the fine stage, based on the panoramic image and perspective depth maps, RoomTex will refine and texture every single object in the room iteratively along a series of selected camera views, until this object is completely painted. Moreover, we propose to maintain superior alignment between RGB and depth spaces via subtle edge detection methods. Extensive experiments show our method is capable of generating high-quality and diverse room textures, and more importantly, supporting interactive fine-grained texture control and flexible scene editing thanks to our inpainting-based framework and compositional mesh input. Our project page is available at https://qwang666.github.io/RoomTex/.

RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting

TL;DR

RoomTex introduces a coarse-to-fine, text-driven indoor scene texturing pipeline that first creates a panoramic depth-guided reference (, ) and then iteratively textures each object from multiple viewpoints, guided by depth maps and ControlNet. It addresses occlusions and geometry imperfections through empty-room refinement, depth-aware inpainting, and a misalignment removal step leveraging RGB and depth edge cues. The method supports interactive fine-grained texture control and scene editing by blending panorama guidance with view-wise inpainting, enabling coherent textures across compositional rooms. Experiments show high-fidelity, style-consistent textures with improved 3D consistency and editing flexibility, though the approach acknowledges limitations in fully covering all views in a single run and points to multi-view diffusion as future work.

Abstract

The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex, to generate high-fidelity and style-consistent textures for untextured compositional scene meshes. In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency. In the fine stage, based on the panoramic image and perspective depth maps, RoomTex will refine and texture every single object in the room iteratively along a series of selected camera views, until this object is completely painted. Moreover, we propose to maintain superior alignment between RGB and depth spaces via subtle edge detection methods. Extensive experiments show our method is capable of generating high-quality and diverse room textures, and more importantly, supporting interactive fine-grained texture control and flexible scene editing thanks to our inpainting-based framework and compositional mesh input. Our project page is available at https://qwang666.github.io/RoomTex/.
Paper Structure (22 sections, 13 equations, 23 figures, 2 tables)

This paper contains 22 sections, 13 equations, 23 figures, 2 tables.

Figures (23)

  • Figure 1: We propose RoomTex to synthesize high-quality and style-consistent textures for given scene meshes. Our method supports generating multiple styles.
  • Figure 2: RoomTex simultaneously enables interactive fine-grained texture control and flexible scene editing of individual objects inside.
  • Figure 3: Framework of RoomTex. We first generate a panoramic reference image of the indoor scene based on a depth map rendered from a compositional untextured room mesh. Based on the panorama, we will refine and paint every object for a textured 3D object. By integrating objects and the empty room, we can finally get a completely textured 3D indoor scene.
  • Figure 4: Iterative inpainting. We leverage the object depth to unproject only object areas of the initial image to the world coordinates. Then, we choose a group of suitable views and iteratively warp the 3D object to these views, under which the untextured area will be filled with diffusion-based inpainting (dense areas) and interpolation-based inpainting (sparse areas).
  • Figure 5: Misalignment removal. We first get the Canny edges of RGB images and Laplacian edges of depth maps as shown in (a) and (b). (c) shows the misalignment areas between texture and depth, which will be removed during the unprojection.
  • ...and 18 more figures