Enhancing Monocular 3D Scene Completion with Diffusion Model

Changlin Song; Jiaqi Wang; Liyun Zhu; He Weng

Enhancing Monocular 3D Scene Completion with Diffusion Model

Changlin Song, Jiaqi Wang, Liyun Zhu, He Weng

TL;DR

FlashDreamer addresses monocular 3D scene reconstruction by completing a single image into a full 3D scene. It combines a pre-trained Flash3D-based 3D Gaussian Splatting with diffusion-based inpainting guided by a Vision-Language Model-generated prompt to synthesize multi-view images and merge them into a coherent 3D representation, without additional training. The method iteratively renders new viewpoints at predefined angles, inpaints unseen regions, and merges the results with alignment masks and a loss that enforces consistency with the intermediate 3DGS, using $T_i$, $R_i$, and $M$ to denote transforms, representations, and masks. Evaluations on a Replica subset demonstrate improved FID and CLIP scores over PixelSynth and show robust, view-consistent completion across various rotation angles, highlighting the practical potential for monocular reconstruction in VR/robotics and autonomous driving.

Abstract

3D scene reconstruction is essential for applications in virtual reality, robotics, and autonomous driving, enabling machines to understand and interact with complex environments. Traditional 3D Gaussian Splatting techniques rely on images captured from multiple viewpoints to achieve optimal performance, but this dependence limits their use in scenarios where only a single image is available. In this work, we introduce FlashDreamer, a novel approach for reconstructing a complete 3D scene from a single image, significantly reducing the need for multi-view inputs. Our approach leverages a pre-trained vision-language model to generate descriptive prompts for the scene, guiding a diffusion model to produce images from various perspectives, which are then fused to form a cohesive 3D reconstruction. Extensive experiments show that our method effectively and robustly expands single-image inputs into a comprehensive 3D scene, extending monocular 3D reconstruction capabilities without further training. Our code is available https://github.com/CharlieSong1999/FlashDreamer/tree/main.

Enhancing Monocular 3D Scene Completion with Diffusion Model

TL;DR

Abstract

Enhancing Monocular 3D Scene Completion with Diffusion Model

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)