Table of Contents
Fetching ...

Unsupervised Point Cloud Registration with Self-Distillation

Christian Löwens, Thorben Funke, André Wagner, Alexandru Paul Condurache

TL;DR

This work presents a self-distillation approach to learn point cloud registration in an unsupervised fashion, which simplifies the training procedure by removing the need for initial hand-crafted features or consecutive point cloud frames as seen in related methods.

Abstract

Rigid point cloud registration is a fundamental problem and highly relevant in robotics and autonomous driving. Nowadays deep learning methods can be trained to match a pair of point clouds, given the transformation between them. However, this training is often not scalable due to the high cost of collecting ground truth poses. Therefore, we present a self-distillation approach to learn point cloud registration in an unsupervised fashion. Here, each sample is passed to a teacher network and an augmented view is passed to a student network. The teacher includes a trainable feature extractor and a learning-free robust solver such as RANSAC. The solver forces consistency among correspondences and optimizes for the unsupervised inlier ratio, eliminating the need for ground truth labels. Our approach simplifies the training procedure by removing the need for initial hand-crafted features or consecutive point cloud frames as seen in related methods. We show that our method not only surpasses them on the RGB-D benchmark 3DMatch but also generalizes well to automotive radar, where classical features adopted by others fail. The code is available at https://github.com/boschresearch/direg .

Unsupervised Point Cloud Registration with Self-Distillation

TL;DR

This work presents a self-distillation approach to learn point cloud registration in an unsupervised fashion, which simplifies the training procedure by removing the need for initial hand-crafted features or consecutive point cloud frames as seen in related methods.

Abstract

Rigid point cloud registration is a fundamental problem and highly relevant in robotics and autonomous driving. Nowadays deep learning methods can be trained to match a pair of point clouds, given the transformation between them. However, this training is often not scalable due to the high cost of collecting ground truth poses. Therefore, we present a self-distillation approach to learn point cloud registration in an unsupervised fashion. Here, each sample is passed to a teacher network and an augmented view is passed to a student network. The teacher includes a trainable feature extractor and a learning-free robust solver such as RANSAC. The solver forces consistency among correspondences and optimizes for the unsupervised inlier ratio, eliminating the need for ground truth labels. Our approach simplifies the training procedure by removing the need for initial hand-crafted features or consecutive point cloud frames as seen in related methods. We show that our method not only surpasses them on the RGB-D benchmark 3DMatch but also generalizes well to automotive radar, where classical features adopted by others fail. The code is available at https://github.com/boschresearch/direg .
Paper Structure (16 sections, 4 equations, 4 figures, 3 tables)

This paper contains 16 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Motivation for Unsupervised Point Cloud Registration. Especially in the automotive context, the collection of ground truth poses is costly and limited in size. Crowdsourced data from consumer-grade cars, on the other hand, contains orders of magnitude more unlabeled data. Using our approach, we can leverage this data and generate pseudo labels with a quality close to ground truth.
  • Figure 2: We simplify the SGP algorithm by removing the verifier and its classical features used for bootstrapping. We also reinterpret its student-teacher analogy in view of self-distillation. While dark boxes indicate trainable methods, the teacher's feature extractor is not trained but instead updated using an exponential mean average (EMA).
  • Figure 3: Self-distillation for registration (DiReg). Both point clouds are passed to the teacher, while the student receives the augmented views. The networks, FCGF choy2019fully feature extractors, predict geometric features for all points in their pairs and we collect correspondences by searching for the nearest neighbors among the feature vectors of the teacher. Given those correspondences, RANSAC estimates a transformation to align both clouds. Next, we search for nearest neighbors in the coordinate space to get improved correspondences for supervising the student. sg denotes the stop-gradient operator to illustrate that we do not backpropagate through the teacher network. Best viewed on display.
  • Figure 4: Training with and without augmentation for teacher. Student's feature-match recall on the 3DMatch validation set during training. While the training without augmentation follows the supervised trajectory, the training with augmentation needs more epochs to overcome the bootstrap phase.