Table of Contents
Fetching ...

B$^3$-Seg: Camera-Free, Training-Free 3DGS Segmentation via Analytic EIG and Beta-Bernoulli Bayesian Updates

Hiromichi Kamata, Samuel Arthur Munro, Fuminori Homma

TL;DR

This work proposes B$^3-Seg (Beta-Bernoulli Bayesian Segmentation for 3DGS), a fast and theoretically grounded method for open-vocabulary 3DGS segmentation under camera-free and training-free conditions and demonstrates that it enables practical, interactive 3DGS segmentation with provable information efficiency.

Abstract

Interactive 3D Gaussian Splatting (3DGS) segmentation is essential for real-time editing of pre-reconstructed assets in film and game production. However, existing methods rely on predefined camera viewpoints, ground-truth labels, or costly retraining, making them impractical for low-latency use. We propose B$^3$-Seg (Beta-Bernoulli Bayesian Segmentation for 3DGS), a fast and theoretically grounded method for open-vocabulary 3DGS segmentation under camera-free and training-free conditions. Our approach reformulates segmentation as sequential Beta-Bernoulli Bayesian updates and actively selects the next view via analytic Expected Information Gain (EIG). This Bayesian formulation guarantees the adaptive monotonicity and submodularity of EIG, which produces a greedy $(1{-}1/e)$ approximation to the optimal view sampling policy. Experiments on multiple datasets show that B$^3$-Seg achieves competitive results to high-cost supervised methods while operating end-to-end segmentation within a few seconds. The results demonstrate that B$^3$-Seg enables practical, interactive 3DGS segmentation with provable information efficiency.

B$^3$-Seg: Camera-Free, Training-Free 3DGS Segmentation via Analytic EIG and Beta-Bernoulli Bayesian Updates

TL;DR

This work proposes B$^3-Seg (Beta-Bernoulli Bayesian Segmentation for 3DGS), a fast and theoretically grounded method for open-vocabulary 3DGS segmentation under camera-free and training-free conditions and demonstrates that it enables practical, interactive 3DGS segmentation with provable information efficiency.

Abstract

Interactive 3D Gaussian Splatting (3DGS) segmentation is essential for real-time editing of pre-reconstructed assets in film and game production. However, existing methods rely on predefined camera viewpoints, ground-truth labels, or costly retraining, making them impractical for low-latency use. We propose B-Seg (Beta-Bernoulli Bayesian Segmentation for 3DGS), a fast and theoretically grounded method for open-vocabulary 3DGS segmentation under camera-free and training-free conditions. Our approach reformulates segmentation as sequential Beta-Bernoulli Bayesian updates and actively selects the next view via analytic Expected Information Gain (EIG). This Bayesian formulation guarantees the adaptive monotonicity and submodularity of EIG, which produces a greedy approximation to the optimal view sampling policy. Experiments on multiple datasets show that B-Seg achieves competitive results to high-cost supervised methods while operating end-to-end segmentation within a few seconds. The results demonstrate that B-Seg enables practical, interactive 3DGS segmentation with provable information efficiency.
Paper Structure (31 sections, 63 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 31 sections, 63 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: B$^3$-Seg actively selects the next view via Expected Information Gain and updates 3D labels by Beta--Bernoulli Bayesian updates. It runs in a few seconds and requires neither predefined views nor ground-truth semantic labels.
  • Figure 2: Overview of B$^3$-Seg. (1) Sample $N_{\mathrm{cand}}$ candidate views on a sphere centered at the estimated object center $\mathbf{c}_{\mathrm{obj}}$. (2) Render each candidate to compute $\mathrm{EIG}(v)$ by Eq. \ref{['eq:eig']}, and pick the best view $v^\star$ (red). On $v^\star$, obtain masks using Grounded SAM2 and CLIP reranking. (3) From the mask, compute $(e_{i,1},e_{i,0})$ by Eq. \ref{['eq:agg-evidence']} and update Beta parameters. We iterate (1)–(3) process 20 steps. The pipeline enables camera-free, training-free, open-vocabulary 3DGS segmentation in a few seconds.
  • Figure 3: Information Gain vs. Expected Information Gain (ours). (a) IG calculation updates the Beta posterior using SAM2 segmentation masks (Eq. \ref{['eq:information-gain']}). (b) Our EIG approximates the posterior update from the prior Beta distribution, avoiding SAM2 inference and enabling efficient viewpoint evaluation (Eq. \ref{['eq:eig']}).
  • Figure 4: Qualitative comparison on text-guided 3D segmentation. We compare our method (B$^{3}$-Seg) with prior 3DGS segmentation approaches. Our method produces cleaner and more complete object masks, especially in cluttered scenes.
  • Figure 5: Candidate-view EIG on LERF-Mask (Teatime) with the prompt "stuffed bear". Each panel shows a candidate rendering; the bottom-right inset is the current confidence map (posterior mean).
  • ...and 7 more figures