Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Romain Loiseau; Elliot Vincent; Mathieu Aubry; Loic Landrieu

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Romain Loiseau, Elliot Vincent, Mathieu Aubry, Loic Landrieu

TL;DR

This work introduces Learnable Earth Parser, an unsupervised framework that decomposes large aerial LiDAR scans into a small set of learnable 3D prototypes. By employing S slots and K learnable prototypes with a probabilistic selection mechanism, the model reconstructs scenes through deformations of prototypes and unions across activated slots, enabling interpretable decompositions and downstream unsupervised instance and semantic segmentation. The approach is trained with a reconstruction-plus-regularization objective that leverages asymmetric Chamfer distances and multiple priors to avoid degenerate solutions, and it is supported by a new Earth Parser Dataset comprising seven diverse aerial LiDAR scenes. Empirical results show competitive reconstruction quality and superior semantic segmentation across scenes, with qualitative demonstrations of interpretable prototypes and instance segmentation. The work demonstrates that scene-specific prototypes can robustly parse complex real-world 3D data, offering practical tools for environmental monitoring and mapping without annotations.

Abstract

We propose an unsupervised method for parsing large 3D scans of real-world scenes with easily-interpretable shapes. This work aims to provide a practical tool for analyzing 3D scenes in the context of aerial surveying and mapping, without the need for user annotations. Our approach is based on a probabilistic reconstruction model that decomposes an input 3D point cloud into a small set of learned prototypical 3D shapes. The resulting reconstruction is visually interpretable and can be used to perform unsupervised instance and low-shot semantic segmentation of complex scenes. We demonstrate the usefulness of our model on a novel dataset of seven large aerial LiDAR scans from diverse real-world scenarios. Our approach outperforms state-of-the-art unsupervised methods in terms of decomposition accuracy while remaining visually interpretable. Our code and dataset are available at https://romainloiseau.fr/learnable-earth-parser/

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

TL;DR

Abstract

Paper Structure (54 sections, 13 equations, 12 figures, 7 tables)

This paper contains 54 sections, 13 equations, 12 figures, 7 tables.

Introduction
Related work
Primitive-based point cloud decomposition.
Decomposition of LiDAR scans.
Aerial LiDAR datasets.
Method
Probabilistic Scene Reconstruction Model
Learnable shape prototypes.
Scene reconstruction model.
Probabilistic modeling.
Training Losses
Reconstruction loss.
Regularization losses.
Training and Implementation Details
Model configuration.
...and 39 more sections

Figures (12)

Figure 1: Learnable Earth Parser. Our unsupervised method takes large aerial 3D scans as input and model them with a small set of learned 3D prototypes. Our approach is trained without annotation and produce legible decompositions of complex scenes, which can be used for semantic and instance segmentation.
Figure 2: Method Overview. Our model approximates an input point cloud $\mathbf{X}$ with $S$ slot models. Each slot maps $\mathbf{X}$ to an affine 3D deformation $\mathcal{T}_s(\mathbf{X})$, a slot activation probability $\alpha_s$, and the joint probabilities $\beta_s^1, \cdots, \beta_s^K$ of the slot being activated and choosing one of the $K$ learnable prototype point clouds $\mathbf{P}^1, \cdots, \mathbf{P}^K$. The output $\mathcal{M}_s(\mathbf{X})$ of an activated slot $s$ is obtained by applying the transformation $\mathcal{T}_s(\mathbf{X})$ to its most likely prototype. Non-activated slots do not contribute to the output.
Figure 3: Earth Parser Dataset. Our dataset contains 7 scenes representing various urban and natural environments acquired by aerial LiDAR. The illustration of the power plant and the greenhouses display the complete scenes, while other ones display a subset of each scene (between $25$ and $50$% of the total area).
Figure 4: Reconstruction Quality. We show two partial scenes with their RGB and intensity values, as well as their reconstruction by our method and competing models. We use the prototypes' intensity to color the points or pixels. As SuperQuadrics does not model the intensity, we use a random colour for each quadric.
Figure 5: Results on DALES varney2020dales. We report quantitative and qualitative results for one tile from DALES.
...and 7 more figures

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

TL;DR

Abstract

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Authors

TL;DR

Abstract

Table of Contents

Figures (12)