Table of Contents
Fetching ...

ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration

Johan Edstedt, André Mateus, Alberto Jaenal

TL;DR

ColabSfM tackles the problem of scalable map-to-map alignment in collaborative SfM by reframing registration as 3D point-cloud alignment between SfM reconstructions. It introduces a synthetic SfM registration data pipeline that generates partial reconstructions from trajectories and MegaDepth/Quad6k, enabling robust learning for geometry-only registration. The authors propose RefineRoITr, an enhanced SE(3)–invariant registration model built on RoITr with a refinement transformer, achieving superior registration performance across MegaDepth, Cambridge, 7-Scenes, and Quad6k benchmarks. This work demonstrates that descriptor-free, geometry-centric registration can enable interoperable, scalable collaborative mapping, with practical implications for cloud-assisted localization and multi-vendor map fusion, while noting limitations related to symmetric scenes and drift in partial reconstructions.$

Abstract

Structure-from-Motion (SfM) is the task of estimating 3D structure and camera poses from images. We define Collaborative SfM (ColabSfM) as sharing distributed SfM reconstructions. Sharing maps requires estimating a joint reference frame, which is typically referred to as registration. However, there is a lack of scalable methods and training datasets for registering SfM reconstructions. In this paper, we tackle this challenge by proposing the scalable task of point cloud registration for SfM reconstructions. We find that current registration methods cannot register SfM point clouds when trained on existing datasets. To this end, we propose a SfM registration dataset generation pipeline, leveraging partial reconstructions from synthetically generated camera trajectories for each scene. Finally, we propose a simple but impactful neural refiner on top of the SotA registration method RoITr that yields significant improvements, which we call RefineRoITr. Our extensive experimental evaluation shows that our proposed pipeline and model enables ColabSfM. Code is available at https://github.com/EricssonResearch/ColabSfM

ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration

TL;DR

ColabSfM tackles the problem of scalable map-to-map alignment in collaborative SfM by reframing registration as 3D point-cloud alignment between SfM reconstructions. It introduces a synthetic SfM registration data pipeline that generates partial reconstructions from trajectories and MegaDepth/Quad6k, enabling robust learning for geometry-only registration. The authors propose RefineRoITr, an enhanced SE(3)–invariant registration model built on RoITr with a refinement transformer, achieving superior registration performance across MegaDepth, Cambridge, 7-Scenes, and Quad6k benchmarks. This work demonstrates that descriptor-free, geometry-centric registration can enable interoperable, scalable collaborative mapping, with practical implications for cloud-assisted localization and multi-vendor map fusion, while noting limitations related to symmetric scenes and drift in partial reconstructions.$

Abstract

Structure-from-Motion (SfM) is the task of estimating 3D structure and camera poses from images. We define Collaborative SfM (ColabSfM) as sharing distributed SfM reconstructions. Sharing maps requires estimating a joint reference frame, which is typically referred to as registration. However, there is a lack of scalable methods and training datasets for registering SfM reconstructions. In this paper, we tackle this challenge by proposing the scalable task of point cloud registration for SfM reconstructions. We find that current registration methods cannot register SfM point clouds when trained on existing datasets. To this end, we propose a SfM registration dataset generation pipeline, leveraging partial reconstructions from synthetically generated camera trajectories for each scene. Finally, we propose a simple but impactful neural refiner on top of the SotA registration method RoITr that yields significant improvements, which we call RefineRoITr. Our extensive experimental evaluation shows that our proposed pipeline and model enables ColabSfM. Code is available at https://github.com/EricssonResearch/ColabSfM

Paper Structure

This paper contains 25 sections, 6 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Our proposed registration paradigm for collaborative SfM reconstructions (ColabSfM). Given two input SfM reconstructions $\mathcal{P}, \mathcal{Q}$ of the same scene, the task is to estimate the relative similarity transform $(s, R, t)$ between them. Our first contribution is to address this as a point cloud registration problem, using only 3D SfM tracks. For this, we do not rely on the visual descriptors, but on the 3D coordinates of the points $\mathbf{P}, \mathbf{Q}$, their normals $\mathbf{N}, \mathbf{M}$ and, optionally, but not necessarily, features $\mathbf{X}, \mathbf{Y}$. To make point cloud registration methods perform well on this task, we propose as our second contribution a scalable pipeline to construct synthetic training datasets for SfM registration. Finally, we propose an improved version of RoITr yu2023rotation as registration method $f_\theta$.
  • Figure 2: Qualitative comparison. We compare our approach to the previous point cloud registration method OverlapPredator huang2021predator on the St. Peter's Basilica test scene. Without training on our proposed SfM registration dataset (column 1), previous methods are unable to produce sufficiently good matches (top row) and accurate relative pose estimation results (bottom row). In contrast, our proposed model RefineRoITr, trained on the proposed dataset, is able to find better matches and hence register the scenes well. The source and target point clouds are depicted in yellow and blue, respectively.
  • Figure 3: Overview of our pipeline. For each scene, from a random image SfM dataset, e.g., MegaDepth li2018megadepth, we retriangulate partial reconstructions using partial trajectories from the scene. Since these trajectories are in the global SfM reference frame, the relative transformation is simply the identity mapping.
  • Figure 4: Example of synthetic trajectories in our proposed dataset. We start with a large scale scene, consisting of hundreds of cameras (left). From this set of cameras we run \ref{['alg:synthetic-trajectories']} until the remaining set of cameras is small. Plotting the trajectories shows that plausible camera motion is achieved by this procedure (right). Sampling in this way bridges the gap between random image collections and video-based trajectories.
  • Figure 5: Overview of our proposed model RefineRoITr. As input we take two point clouds $\mathcal{P} =(\mathbf{P},\mathbf{N}, \mathbf{X})$, $\mathcal{Q} =(\mathbf{Q},\mathbf{M}, \mathbf{Y})$, consisting of 3D points $\mathbf{P}/\mathbf{Q}\in\mathbb{R}^{n/m\times 3}$, and normals $\mathbf{N}/\mathbf{M}\in\mathbb{R}^{n/m\times 3}$. In this work the features $\mathbf{X}/\mathbf{Y}$ are always assumed to be constant. We ensure rotation invariance by Point Pair Feature (PPF) encoding. The PPFs are fed together with the features into an encoder $e_{\theta}$, producing $N'$ coarse superpoints $\mathcal{P}'/\mathcal{Q}'$. These are passed through a global Transformer $g_{\theta}$, from which the top-$k$ coarse correspondences $\mathcal{C}'\in \mathbb{R}^{k\times 2}$ are extracted. A decoder $d_{\theta}$ takes the coarse features from $g_{\theta}$ and the finer features from $e_{\theta}$ and produces fine features $\hat{\mathbf{X}},\hat{\mathbf{Y}}$. Using the coarse correspondences we extract neighbourhoods $\hat{\mathbf{G}}^X,\hat{\mathbf{G}}^Y \in \mathbb{R}^{k \times 64 \times c}$, which we feed into our proposed refinement Transformer. The refined features from the Transformer are used to construct a cost matrix, where the Sinkhorn algorithm untzelmann2013scalablesarlin20superglueyu2023rotation is used to solve the optimal transport (OT) problem, producing our final matches $\mathcal{C}$. The network is trained using $\mathcal{L}_s + \mathcal{L}_p$.
  • ...and 3 more figures