Coherent 3D Scene Diffusion From a Single RGB Image

Manuel Dahnert; Angela Dai; Norman Müller; Matthias Nießner

Coherent 3D Scene Diffusion From a Single RGB Image

Manuel Dahnert, Angela Dai, Norman Müller, Matthias Nießner

TL;DR

This work tackles single-view 3D scene reconstruction by casting it as a conditional diffusion process that jointly infers all objects' poses and geometries. It introduces a novel intra-scene attention-based diffusion prior to model inter-object relationships and a surface-alignment loss that leverages an expressive intermediate shape representation to enable training with partial ground-truth. The method achieves state-of-the-art results on SUN RGB-D and Pix3D, significantly improving both 3D scene reconstruction metrics and single-object shape quality, while demonstrating generalization to unseen indoor data and enabling unconditional shape synthesis. These advances offer stronger, more coherent 3D scene understanding from monocular input, with potential implications for robotics, AR/VR content creation, and immersive environments.

Abstract

We present a novel diffusion-based approach for coherent 3D scene reconstruction from a single RGB image. Our method utilizes an image-conditioned 3D scene diffusion model to simultaneously denoise the 3D poses and geometries of all objects within the scene. Motivated by the ill-posed nature of the task and to obtain consistent scene reconstruction results, we learn a generative scene prior by conditioning on all scene objects simultaneously to capture the scene context and by allowing the model to learn inter-object relationships throughout the diffusion process. We further propose an efficient surface alignment loss to facilitate training even in the absence of full ground-truth annotation, which is common in publicly available datasets. This loss leverages an expressive shape representation, which enables direct point sampling from intermediate shape predictions. By framing the task of single RGB image 3D scene reconstruction as a conditional diffusion process, our approach surpasses current state-of-the-art methods, achieving a 12.04% improvement in AP3D on SUN RGB-D and a 13.43% increase in F-Score on Pix3D.

Coherent 3D Scene Diffusion From a Single RGB Image

TL;DR

Abstract

Coherent 3D Scene Diffusion From a Single RGB Image

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)