Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Yuqi Wu; Wenzhao Zheng; Jie Zhou; Jiwen Lu

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Yuqi Wu, Wenzhao Zheng, Jie Zhou, Jiwen Lu

TL;DR

Point3R introduces an online streaming framework for dense 3D reconstruction using an explicit spatial pointer memory that binds memory to 3D coordinates. It adds a 3D hierarchical position embedding and a memory fusion mechanism to enable efficient, scalable integration of new frames into a growing global coordinate system. The approach demonstrates competitive or state-of-the-art performance across dense reconstruction, monocular/video depth estimation, and camera pose tasks with low training cost, and shows robustness to long sequences and unordered inputs. Ablation studies validate the contributions of the pointer memory, 3D position embedding, and fusion strategy. This work offers a practical, interpretable memory mechanism for online 3D scene understanding in dynamic environments.

Abstract

Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs. Code: https://github.com/YkiWu/Point3R.

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

TL;DR

Abstract

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)