Planar Gaussian Splatting
Farhad G. Zanjani, Hong Cai, Hanno Ackermann, Leila Mirvakhabova, Fatih Porikli
TL;DR
The paper tackles the problem of recovering explicit 3D planar geometry from multi-view RGB imagery without 3D plane labels or depth supervision. It introduces Planar Gaussian Splatting (PGS), which represents a scene with 3D Gaussian primitives organized into a Gaussian Mixture Tree (GMT) and augments each Gaussian with a learnable plane descriptor; plane instances emerge from probabilistic merges guided by Bhattacharyya distance and descriptor similarity. Plane descriptors are learned by leveraging 2D SAM masks and 2D normal predictions, using a linear-regression-based association across views and a Region Adjacency Graph to merge segments belonging to the same plane, with local planar alignment and mean-shift-based holistic separability enforcing global coherence. Experiments on ScanNetv2 and Replica show state-of-the-art planar reconstruction performance compared to supervised and optimization-based baselines, with faster inference than prior optimization approaches, while revealing some limitations in dark regions and very large planes. Overall, PGS provides a scalable, explicit, and differentiable framework for 3D planar scene understanding with potential benefits for AR/VR and robotics applications.
Abstract
This paper presents Planar Gaussian Splatting (PGS), a novel neural rendering approach to learn the 3D geometry and parse the 3D planes of a scene, directly from multiple RGB images. The PGS leverages Gaussian primitives to model the scene and employ a hierarchical Gaussian mixture approach to group them. Similar Gaussians are progressively merged probabilistically in the tree-structured Gaussian mixtures to identify distinct 3D plane instances and form the overall 3D scene geometry. In order to enable the grouping, the Gaussian primitives contain additional parameters, such as plane descriptors derived by lifting 2D masks from a general 2D segmentation model and surface normals. Experiments show that the proposed PGS achieves state-of-the-art performance in 3D planar reconstruction without requiring either 3D plane labels or depth supervision. In contrast to existing supervised methods that have limited generalizability and struggle under domain shift, PGS maintains its performance across datasets thanks to its neural rendering and scene-specific optimization mechanism, while also being significantly faster than existing optimization-based approaches.
