Table of Contents
Fetching ...

PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching

Chen Ziwen, Zexiang Xu, Li Fuxin

TL;DR

This work proposes a novel online, point-based 3D reconstruction method from posed monocular RGB videos that maintains a global point cloud representation of the scene, continuously updating the features and 3D locations of points as new images are observed.

Abstract

We propose a novel online, point-based 3D reconstruction method from posed monocular RGB videos. Our model maintains a global point cloud representation of the scene, continuously updating the features and 3D locations of points as new images are observed. It expands the point cloud with newly detected points while carefully removing redundancies. The point cloud updates and the depth predictions for new points are achieved through a novel ray-based 2D-3D feature matching technique, which is robust against errors in previous point position predictions. In contrast to offline methods, our approach processes infinite-length sequences and provides real-time updates. Additionally, the point cloud imposes no pre-defined resolution or scene size constraints, and its unified global representation ensures view consistency across perspectives. Experiments on the ScanNet dataset show that our method achieves comparable quality among online MVS approaches. Project page: https://arthurhero.github.io/projects/pointrecon

PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching

TL;DR

This work proposes a novel online, point-based 3D reconstruction method from posed monocular RGB videos that maintains a global point cloud representation of the scene, continuously updating the features and 3D locations of points as new images are observed.

Abstract

We propose a novel online, point-based 3D reconstruction method from posed monocular RGB videos. Our model maintains a global point cloud representation of the scene, continuously updating the features and 3D locations of points as new images are observed. It expands the point cloud with newly detected points while carefully removing redundancies. The point cloud updates and the depth predictions for new points are achieved through a novel ray-based 2D-3D feature matching technique, which is robust against errors in previous point position predictions. In contrast to offline methods, our approach processes infinite-length sequences and provides real-time updates. Additionally, the point cloud imposes no pre-defined resolution or scene size constraints, and its unified global representation ensures view consistency across perspectives. Experiments on the ScanNet dataset show that our method achieves comparable quality among online MVS approaches. Project page: https://arthurhero.github.io/projects/pointrecon

Paper Structure

This paper contains 12 sections, 8 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Workflow of PointRecon. We begin with monocular depth prediction for the first image, lifting 2D points into 3D space to form the initial point cloud. For each subsequent image, we perform feature matching between the 2D image features and the 3D point cloud features to update the features and positions of the point cloud and to predict depth for the 2D image. Finally, the new points are merged with the existing point cloud.
  • Figure 2: Illustration of the scene update step (single level shown for simplicity). For a point $p_i$ in the point cloud, we uniformly sample $K$ positions along its ray and project them onto the image plane. Each projected sampled position selects $M$ nearest neighbors in the 2D feature map. We then compute the feature dot product between $p_i$ and each neighbor, along with geometric metadata, such as the distance between $p_i$ and the crossing of their camera rays. The 3D point uses both geometric information and feature similarities to determine its position adjustment along the ray.
  • Figure 3: Illustration of the depth prediction step (single level shown for simplicity). For a 2D feature point $p_i^\circ$ on the image plane, we uniformly sample $K$ positions along its camera ray. Each sampled position identifies $M$ neighboring points in the point cloud by finding the nearest rays. For each neighbor, we compute its feature dot product with $p_i^\circ$, along with geometric metadata such as the depth at the crossing of the rays. The 2D point uses both geometric data and feature similarities to predict its depth value.
  • Figure 4: Visualization of generated meshes (please magnify when viewing). We first render depth maps from the scene point cloud and then fuse them using TSDF fusion to generate mesh. Our method produces more detailed reconstructions compared to previous work, though less smooth surfaces may result from the inherent discrete nature of point clouds. Check out our project page https://arthurhero.github.io/projects/pointrecon/ for incremental reconstruction videos.