Metropolis-Hastings Sampling for 3D Gaussian Reconstruction
Hyunjin Kim, Haebeom Jung, Jaesik Park
TL;DR
This work addresses the memory and efficiency challenges of 3D Gaussian Splatting (3DGS) by reframing densification and pruning as an adaptive Metropolis-Hastings (MH) sampling problem guided by aggregated multi-view photometric errors. It derives a Bayesian posterior over Gaussian configurations, introduces per-Gaussian importance scores from opacity and view-consistent errors, and uses coarse–fine proposals to insert and relocate Gaussians with principled acceptance tests. The approach achieves faster convergence and, on Mip-NeRF360, Tanks & Temples, and Deep Blending, matches or surpasses state-of-the-art view synthesis quality while using fewer Gaussians and less memory. This principled probabilistic framework reduces heuristic dependence in 3DGS and offers a scalable, generalizable path toward adaptive 3D scene reconstructions with real-time potential.
Abstract
We propose an adaptive sampling framework for 3D Gaussian Splatting (3DGS) that leverages comprehensive multi-view photometric error signals within a unified Metropolis-Hastings approach. Vanilla 3DGS heavily relies on heuristic-based density-control mechanisms (e.g., cloning, splitting, and pruning), which can lead to redundant computations or premature removal of beneficial Gaussians. Our framework overcomes these limitations by reformulating densification and pruning as a probabilistic sampling process, dynamically inserting and relocating Gaussians based on aggregated multi-view errors and opacity scores. Guided by Bayesian acceptance tests derived from these error-based importance scores, our method substantially reduces reliance on heuristics, offers greater flexibility, and adaptively infers Gaussian distributions without requiring predefined scene complexity. Experiments on benchmark datasets, including Mip-NeRF360, Tanks and Temples and Deep Blending, show that our approach reduces the number of Gaussians needed, achieving faster convergence while matching or modestly surpassing the view-synthesis quality of state-of-the-art models.
