Table of Contents
Fetching ...

RecurGS: Interactive Scene Modeling via Discrete-State Recurrent Gaussian Fusion

Wenhao Hu, Haonan Zhou, Zesheng Li, Liu Liu, Jiacheng Dong, Zhizhong Su, Gaoang Wang

TL;DR

RecurGS tackles interactive scene modeling under discrete state changes by maintaining a single evolving Gaussian scene and updating it through cross-state consistency, SE(3) pose refinement, and replay-guided recurrent optimization. The approach enables novel-state synthesis by transforming foreground objects while preserving background, and uses visibility-guided refinement and region completion to efficiently fuse only observable regions. Experiments on synthetic and real data show competitive reconstruction quality with substantially improved update efficiency and scalability compared to all-at-once baselines like IGFuse. The method demonstrates strong potential for long-horizon, object-level interactive Gaussian worlds in vision and robotics applications.

Abstract

Recent advances in 3D scene representations have enabled high-fidelity novel view synthesis, yet adapting to discrete scene changes and constructing interactive 3D environments remain open challenges in vision and robotics. Existing approaches focus solely on updating a single scene without supporting novel-state synthesis. Others rely on diffusion-based object-background decoupling that works on one state at a time and cannot fuse information across multiple observations. To address these limitations, we introduce RecurGS, a recurrent fusion framework that incrementally integrates discrete Gaussian scene states into a single evolving representation capable of interaction. RecurGS detects object-level changes across consecutive states, aligns their geometric motion using semantic correspondence and Lie-algebra based SE(3) refinement, and performs recurrent updates that preserve historical structures through replay supervision. A voxelized, visibility-aware fusion module selectively incorporates newly observed regions while keeping stable areas fixed, mitigating catastrophic forgetting and enabling efficient long-horizon updates. RecurGS supports object-level manipulation, synthesizes novel scene states without requiring additional scans, and maintains photorealistic fidelity across evolving environments. Extensive experiments across synthetic and real-world datasets demonstrate that our framework delivers high-quality reconstructions with substantially improved update efficiency, providing a scalable step toward continuously interactive Gaussian worlds.

RecurGS: Interactive Scene Modeling via Discrete-State Recurrent Gaussian Fusion

TL;DR

RecurGS tackles interactive scene modeling under discrete state changes by maintaining a single evolving Gaussian scene and updating it through cross-state consistency, SE(3) pose refinement, and replay-guided recurrent optimization. The approach enables novel-state synthesis by transforming foreground objects while preserving background, and uses visibility-guided refinement and region completion to efficiently fuse only observable regions. Experiments on synthetic and real data show competitive reconstruction quality with substantially improved update efficiency and scalability compared to all-at-once baselines like IGFuse. The method demonstrates strong potential for long-horizon, object-level interactive Gaussian worlds in vision and robotics applications.

Abstract

Recent advances in 3D scene representations have enabled high-fidelity novel view synthesis, yet adapting to discrete scene changes and constructing interactive 3D environments remain open challenges in vision and robotics. Existing approaches focus solely on updating a single scene without supporting novel-state synthesis. Others rely on diffusion-based object-background decoupling that works on one state at a time and cannot fuse information across multiple observations. To address these limitations, we introduce RecurGS, a recurrent fusion framework that incrementally integrates discrete Gaussian scene states into a single evolving representation capable of interaction. RecurGS detects object-level changes across consecutive states, aligns their geometric motion using semantic correspondence and Lie-algebra based SE(3) refinement, and performs recurrent updates that preserve historical structures through replay supervision. A voxelized, visibility-aware fusion module selectively incorporates newly observed regions while keeping stable areas fixed, mitigating catastrophic forgetting and enabling efficient long-horizon updates. RecurGS supports object-level manipulation, synthesizes novel scene states without requiring additional scans, and maintains photorealistic fidelity across evolving environments. Extensive experiments across synthetic and real-world datasets demonstrate that our framework delivers high-quality reconstructions with substantially improved update efficiency, providing a scalable step toward continuously interactive Gaussian worlds.

Paper Structure

This paper contains 34 sections, 13 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Overview of our recurrent Gaussian fusion framework. Given a sequence of discrete scene states $S_0 \rightarrow S_n$, our model recurrently updates a single Gaussian scene representation by leveraging supervision from the current state and replay from past states. This produces a consistently fused scene $G_n$ that captures object-level changes across time, enabling controllable novel-state synthesis.
  • Figure 2: Efficiency vs. Quality. We compare RecurGS (Ours) against state-of-the-art methods on reconstruction quality, runtime, and GPU memory usage (bubble size). Unlike IGFuse which suffers from heavy memory overhead, our recurrent approach achieves comparable fidelity with significantly lower resource consumption.
  • Figure 3: Overview of our recurrent Gaussian scene fusion pipeline. Given two consecutive scene states, the system first performs cross-state consistency modeling through visual change localization, object association, and geometric pose alignment. The previous Gaussian scene is then recurrently optimized using supervision from the current state and replayed renderings of earlier states. Newly observed regions are completed using 3D initialization from a separately reconstructed scene, and visibility-guided refinement updates only object-relevant Gaussians, resulting in an efficient and consistent fused representation. Here, $R$ denotes rendered images from the Gaussian scene and $O$ denotes the observed input images.
  • Figure 4: Our method updates only the regions corresponding to moving objects, improving optimization efficiency, while unseen areas are masked out to prevent unintended updates and forgetting of previously reconstructed content.
  • Figure 5: Qualitative comparison of novel-state synthesis on real-world (top) and synthetic (bottom) scenes. While existing methods exhibit boundary artifacts, missing background regions, or object mixing, our approach produces accurate and complete novel-state renderings that closely match the ground truth.
  • ...and 7 more figures