Table of Contents
Fetching ...

Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments

Raman Jha, Yang Zhou, Giuseppe Loianno

TL;DR

<3-5 sentence high-level summary>Dynamic environments in robotics produce a data bottleneck for dense 3D reconstruction. The authors introduce an adaptive keyframe selection front-end that uses a hybrid photometric-SSIM error and a momentum-aware dynamic threshold to selectively forward only informative frames to state-of-the-art reconstruction networks. They integrate this module with Spann3r and CUT3R and demonstrate improved reconstruction quality and frame efficiency across static and dynamic datasets, with ablation analyses confirming the contribution of each component. The work advances scalable, real-time perception by enabling high-fidelity 3D world models from compressed data streams and points toward robust digital twin applications.

Abstract

In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.

Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments

TL;DR

<3-5 sentence high-level summary>Dynamic environments in robotics produce a data bottleneck for dense 3D reconstruction. The authors introduce an adaptive keyframe selection front-end that uses a hybrid photometric-SSIM error and a momentum-aware dynamic threshold to selectively forward only informative frames to state-of-the-art reconstruction networks. They integrate this module with Spann3r and CUT3R and demonstrate improved reconstruction quality and frame efficiency across static and dynamic datasets, with ablation analyses confirming the contribution of each component. The work advances scalable, real-time perception by enabling high-fidelity 3D world models from compressed data streams and points toward robust digital twin applications.

Abstract

In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.

Paper Structure

This paper contains 25 sections, 7 equations, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: The proposed architecture incorporates an Adaptive Keyframe Selection module that selects informative RGB-D frames based on photometric and structural differences. This content-aware strategy reduces redundancy and improves reconstruction by focusing on keyframes that capture meaningful scene changes. It enables more efficient and accurate 3D reconstruction, especially in dynamic environments.
  • Figure 2: Qualitative Results on the dynamic BONN dataset. We compare our method with concurrent works, CUT3R. Our method achieves the best qualitative results in complex and cluttered environments.
  • Figure 3: Scene-wise comparison of Keyframe Compression Ratio (KFCR) and Chamfer Distance on the BONN dataset. Our adaptive method (blue) varies its selection rate based on scene complexity, while maintaining competitive reconstruction quality compared to the static CUT3R baseline (orange).