Table of Contents
Fetching ...

3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework

Tobias Sautter, Jan-Niklas Dihlmann, Hendrik P. A. Lensch

TL;DR

This work tackles producing editable 3D indoor scenes from a single image by integrating segmentation, context-aware inpainting, and 2D-to-3D asset generation within a differentiable, geometry-constrained framework. The authors introduce a 4-DoF ground-alignment PlanarModel and an Application-Querying inpainting mechanism to ensure physically plausible layouts and complete backgrounds. Their pipeline achieves state-of-the-art performance on synthetic benchmarks and generalizes to real and outdoor scenes, delivering coherent, production-ready assets suitable for VFX and games. Overall, 3D-RE-GEN reduces manual modeling time and enables reliable, camera-consistent scene reconstruction from a single view.

Abstract

Recent advances in 3D scene generation produce visually appealing output, but current representations hinder artists' workflows that require modifiable 3D textured mesh scenes for visual effects and game development. Despite significant advances, current textured mesh scene reconstruction methods are far from artist ready, suffering from incorrect object decomposition, inaccurate spatial relationships, and missing backgrounds. We present 3D-RE-GEN, a compositional framework that reconstructs a single image into textured 3D objects and a background. We show that combining state of the art models from specific domains achieves state of the art scene reconstruction performance, addressing artists' requirements. Our reconstruction pipeline integrates models for asset detection, reconstruction, and placement, pushing certain models beyond their originally intended domains. Obtaining occluded objects is treated as an image editing task with generative models to infer and reconstruct with scene level reasoning under consistent lighting and geometry. Unlike current methods, 3D-RE-GEN generates a comprehensive background that spatially constrains objects during optimization and provides a foundation for realistic lighting and simulation tasks in visual effects and games. To obtain physically realistic layouts, we employ a novel 4-DoF differentiable optimization that aligns reconstructed objects with the estimated ground plane. 3D-RE-GEN~achieves state of the art performance in single image 3D scene reconstruction, producing coherent, modifiable scenes through compositional generation guided by precise camera recovery and spatial optimization.

3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework

TL;DR

This work tackles producing editable 3D indoor scenes from a single image by integrating segmentation, context-aware inpainting, and 2D-to-3D asset generation within a differentiable, geometry-constrained framework. The authors introduce a 4-DoF ground-alignment PlanarModel and an Application-Querying inpainting mechanism to ensure physically plausible layouts and complete backgrounds. Their pipeline achieves state-of-the-art performance on synthetic benchmarks and generalizes to real and outdoor scenes, delivering coherent, production-ready assets suitable for VFX and games. Overall, 3D-RE-GEN reduces manual modeling time and enables reliable, camera-consistent scene reconstruction from a single view.

Abstract

Recent advances in 3D scene generation produce visually appealing output, but current representations hinder artists' workflows that require modifiable 3D textured mesh scenes for visual effects and game development. Despite significant advances, current textured mesh scene reconstruction methods are far from artist ready, suffering from incorrect object decomposition, inaccurate spatial relationships, and missing backgrounds. We present 3D-RE-GEN, a compositional framework that reconstructs a single image into textured 3D objects and a background. We show that combining state of the art models from specific domains achieves state of the art scene reconstruction performance, addressing artists' requirements. Our reconstruction pipeline integrates models for asset detection, reconstruction, and placement, pushing certain models beyond their originally intended domains. Obtaining occluded objects is treated as an image editing task with generative models to infer and reconstruct with scene level reasoning under consistent lighting and geometry. Unlike current methods, 3D-RE-GEN generates a comprehensive background that spatially constrains objects during optimization and provides a foundation for realistic lighting and simulation tasks in visual effects and games. To obtain physically realistic layouts, we employ a novel 4-DoF differentiable optimization that aligns reconstructed objects with the estimated ground plane. 3D-RE-GEN~achieves state of the art performance in single image 3D scene reconstruction, producing coherent, modifiable scenes through compositional generation guided by precise camera recovery and spatial optimization.

Paper Structure

This paper contains 23 sections, 1 equation, 9 figures, 2 tables.

Figures (9)

  • Figure 1: 3D-RE-GEN in action: A single image is decomposed into a clean background and a complete 3D scene with individual 3D objects, creating a production-ready asset for immediate use in VFX and games. Project page: https://3dregen.jdihlmann.com/
  • Figure 2: 3D-RE-GEN Framework Overview. Our framework converts a single image into a complete 3D scene. First, we segment the image: these masks provide the 2D silhouette loss, while our novel Application-Querying (A-Q) model generates clean, inpainted object images for 3D meshing. In parallel, the input and a generated "empty room" image are used to extract the camera and scene point cloud, which is masked to create the 3D geometric loss. Finally, our Scene Positioning model assembles the 3D assets and background by minimizing both losses, using a novel 4-DOF constrained workflow to ensure all ground based objects are physically aligned to the floor.
  • Figure 3: Application-Querying. Visual example of how we utilize a GUI-style interface to provide better scene context to the image manipulation model.
  • Figure 4: Qualitative comparison across different methods for different input scenes. Starting with 4 scenes based on synthetic datasets and two real images. In the bottom line, we even tested an outside image.
  • Figure 5: Quantitative survey with 59 participants on how people perceive scene reconstruction method outputs.
  • ...and 4 more figures