MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

Shaoheng Fang; Chaohui Yu; Fan Wang; Qixing Huang

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

Shaoheng Fang, Chaohui Yu, Fan Wang, Qixing Huang

TL;DR

MVRoom addresses the challenge of controllable 3D indoor scene generation by introducing a two-stage, layout-conditioned NVS pipeline that converts a coarse 3D layout into rich multi-view conditioning signals and employs a diffusion model with layout-aware epipolar attention to ensure cross-view consistency. It adds a recursive scene-generation framework that explores camera trajectories guided by the layout and maintains a global point cloud to sustain global coherence, culminating in high-fidelity 3D-GS reconstructions. Empirical results on 3D-FRONT show significant improvements in multi-view consistency and perceptual quality, supported by ablations validating the key components. The approach enables text-to-scene generation and robust scene completion, with potential impact for AR/VR content creation and immersive environments.

Abstract

We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage design in which the 3D layout is used throughout to enforce multi-view consistency. The first stage employs novel representations to effectively bridge the 3D layout and consistent image-based condition signals for multi-view generation. The second stage performs image-conditioned multi-view generation, incorporating a layout-aware epipolar attention mechanism to enhance multi-view consistency during the diffusion process. Additionally, we introduce an iterative framework that generates 3D scenes with varying numbers of objects and scene complexities by recursively performing multi-view generation (MVRoom), supporting text-to-scene generation. Experimental results demonstrate that our approach achieves high-fidelity and controllable 3D scene generation for NVS, outperforming state-of-the-art baseline methods both quantitatively and qualitatively. Ablation studies further validate the effectiveness of key components within our generation pipeline.

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

TL;DR

Abstract

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)