Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments
Raman Jha, Yang Zhou, Giuseppe Loianno
TL;DR
<3-5 sentence high-level summary>Dynamic environments in robotics produce a data bottleneck for dense 3D reconstruction. The authors introduce an adaptive keyframe selection front-end that uses a hybrid photometric-SSIM error and a momentum-aware dynamic threshold to selectively forward only informative frames to state-of-the-art reconstruction networks. They integrate this module with Spann3r and CUT3R and demonstrate improved reconstruction quality and frame efficiency across static and dynamic datasets, with ablation analyses confirming the contribution of each component. The work advances scalable, real-time perception by enabling high-fidelity 3D world models from compressed data streams and points toward robust digital twin applications.
Abstract
In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.
