Table of Contents
Fetching ...

PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors

Xirui Jin, Renbiao Jin, Boying Li, Danping Zou, Wenxian Yu

TL;DR

A pipeline for Language-Prompted Planar Priors (LP3) is designed that employs a pretrained vision-language segmentation model and refines its region proposals via cross-view fusion and inspection with geometric priors, and shows that PlanarGS reconstructs accurate and detailed 3D surfaces, consistently outperforming state-of-the-art methods by a large margin.

Abstract

Three-dimensional Gaussian Splatting (3DGS) has recently emerged as an efficient representation for novel-view synthesis, achieving impressive visual quality. However, in scenes dominated by large and low-texture regions, common in indoor environments, the photometric loss used to optimize 3DGS yields ambiguous geometry and fails to recover high-fidelity 3D surfaces. To overcome this limitation, we introduce PlanarGS, a 3DGS-based framework tailored for indoor scene reconstruction. Specifically, we design a pipeline for Language-Prompted Planar Priors (LP3) that employs a pretrained vision-language segmentation model and refines its region proposals via cross-view fusion and inspection with geometric priors. 3D Gaussians in our framework are optimized with two additional terms: a planar prior supervision term that enforces planar consistency, and a geometric prior supervision term that steers the Gaussians toward the depth and normal cues. We have conducted extensive experiments on standard indoor benchmarks. The results show that PlanarGS reconstructs accurate and detailed 3D surfaces, consistently outperforming state-of-the-art methods by a large margin. Project page: https://planargs.github.io

PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors

TL;DR

A pipeline for Language-Prompted Planar Priors (LP3) is designed that employs a pretrained vision-language segmentation model and refines its region proposals via cross-view fusion and inspection with geometric priors, and shows that PlanarGS reconstructs accurate and detailed 3D surfaces, consistently outperforming state-of-the-art methods by a large margin.

Abstract

Three-dimensional Gaussian Splatting (3DGS) has recently emerged as an efficient representation for novel-view synthesis, achieving impressive visual quality. However, in scenes dominated by large and low-texture regions, common in indoor environments, the photometric loss used to optimize 3DGS yields ambiguous geometry and fails to recover high-fidelity 3D surfaces. To overcome this limitation, we introduce PlanarGS, a 3DGS-based framework tailored for indoor scene reconstruction. Specifically, we design a pipeline for Language-Prompted Planar Priors (LP3) that employs a pretrained vision-language segmentation model and refines its region proposals via cross-view fusion and inspection with geometric priors. 3D Gaussians in our framework are optimized with two additional terms: a planar prior supervision term that enforces planar consistency, and a geometric prior supervision term that steers the Gaussians toward the depth and normal cues. We have conducted extensive experiments on standard indoor benchmarks. The results show that PlanarGS reconstructs accurate and detailed 3D surfaces, consistently outperforming state-of-the-art methods by a large margin. Project page: https://planargs.github.io

Paper Structure

This paper contains 43 sections, 17 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: PlanarGS overview. Our method takes multi-view images and language prompts as inputs, getting planar priors through the pipeline for Language-Prompted Planar Priors (LP3). Our planar prior supervision includes plane-guided initialization, Gaussian flattening, and the co-planarity constraint, accompanied by geometric prior supervision. Both foundation models in the figure are pretrained.
  • Figure 2: Pipeline for Language-Prompted Planar Priors (LP3). (a) We employ cross-view fusion to supplement bounding box proposals. (b)(c) Prior normal and plane-distance maps are incorporated for geometric inspection. Consequently, we obtain abundant and accurate planar priors from this robust pipeline.
  • Figure 3: Qualitative comparison. We present reconstructed meshes from other methods and PlanarGS. The right column shows our colored meshes. The results demonstrate that PlanarGS achieves more accurate and comprehensive high-fidelity mesh reconstruction.
  • Figure 4: Novel view synthesis comparison.With the introduction of planar and geometric supervision, PlanarGS can effectively eliminate artifacts present in other methods.
  • Figure 5: Ablation of planar priors. ZeroPlane tends to wrongly segment cluttered objects into planar regions, while using GroundedSAM without the pipeline for Language-Prompted Planar Priors (LP3) cannot distinguish different planes within a single object. Neither of them can provide reliable planar priors for Gaussian optimizing.
  • ...and 13 more figures