Integrating Meshes and 3D Gaussians for Indoor Scene Reconstruction with SAM Mask Guidance
Jiyeop Kim, Jongwoo Lim
TL;DR
The paper tackles indoor scene reconstruction by marrying room-layout meshes with 3D Gaussians for objects, addressing joint-training ambiguity via Segment Anything Model (SAM) masks to assign each instance to a single primitive. It introduces a SAM-based mask loss and an additional densification stage to stabilize training and improve rendering quality, while enabling easy room-layout editing by decoupling primitives. Evaluations on the Replica dataset show improved instance separation (GIoU/LIoU) and competitive image quality, with demonstrable editing capabilities for room layouts. By combining explicit meshes for layout with fast Gaussians for objects, the approach offers a flexible, editable, and efficient framework for indoor scene reconstruction and potential extensions to larger or outdoor scenes.
Abstract
We present a novel approach for 3D indoor scene reconstruction that combines 3D Gaussian Splatting (3DGS) with mesh representations. We use meshes for the room layout of the indoor scene, such as walls, ceilings, and floors, while employing 3D Gaussians for other objects. This hybrid approach leverages the strengths of both representations, offering enhanced flexibility and ease of editing. However, joint training of meshes and 3D Gaussians is challenging because it is not clear which primitive should affect which part of the rendered image. Objects close to the room layout often struggle during training, particularly when the room layout is textureless, which can lead to incorrect optimizations and unnecessary 3D Gaussians. To overcome these challenges, we employ Segment Anything Model (SAM) to guide the selection of primitives. The SAM mask loss enforces each instance to be represented by either Gaussians or meshes, ensuring clear separation and stable training. Furthermore, we introduce an additional densification stage without resetting the opacity after the standard densification. This stage mitigates the degradation of image quality caused by a limited number of 3D Gaussians after the standard densification.
