The P$^3$ dataset: Pixels, Points and Polygons for Multimodal Building Vectorization

Raphael Sulzer; Liuyun Duan; Nicolas Girard; Florent Lafarge

The P$^3$ dataset: Pixels, Points and Polygons for Multimodal Building Vectorization

Raphael Sulzer, Liuyun Duan, Nicolas Girard, Florent Lafarge

TL;DR

The paper introduces the P$^3$ dataset, a large-scale multimodal benchmark for building vectorization that jointly leverages high-resolution aerial imagery and aerial LiDAR point clouds to predict 2D building footprints. It provides data from the USA, Switzerland, and New Zealand across 638 km$^2$, containing ~10$^{10}$ LiDAR points and 25 cm RGB imagery, along with ground-truth 2D polygons in MS-COCO format. The authors benchmark three state-of-the-art vectorization methods (FFL, HiSup, Pix2Poly) in image-only, LiDAR-only, and multimodal fusion settings, and introduce comprehensive metrics including POLIS, HD, CD, and MTA beyond IoU-based measures. The study demonstrates that LiDAR improves polygon prediction, that fusing image and LiDAR yields the best results, and that the dataset is sufficiently challenging to motivate multimodal approaches; the data and pretrained models are publicly available for broader evaluation and reuse. The work highlights practical implications for scalable, accurate cadastral mapping and points to future work toward broader geographic coverage and richer annotations.

Abstract

We present the P$^3$ dataset, a large-scale multimodal benchmark for building vectorization, constructed from aerial LiDAR point clouds, high-resolution aerial imagery, and vectorized 2D building outlines, collected across three continents. The dataset contains over 10 billion LiDAR points with decimeter-level accuracy and RGB images at a ground sampling distance of 25 centimeter. While many existing datasets primarily focus on the image modality, P$^3$ offers a complementary perspective by also incorporating dense 3D information. We demonstrate that LiDAR point clouds serve as a robust modality for predicting building polygons, both in hybrid and end-to-end learning frameworks. Moreover, fusing aerial LiDAR and imagery further improves accuracy and geometric quality of predicted polygons. The P$^3$ dataset is publicly available, along with code and pretrained weights of three state-of-the-art models for building polygon prediction at https://github.com/raphaelsulzer/PixelsPointsPolygons .

The P$^3$ dataset: Pixels, Points and Polygons for Multimodal Building Vectorization

TL;DR

The paper introduces the P

dataset, a large-scale multimodal benchmark for building vectorization that jointly leverages high-resolution aerial imagery and aerial LiDAR point clouds to predict 2D building footprints. It provides data from the USA, Switzerland, and New Zealand across 638 km

, containing ~10

LiDAR points and 25 cm RGB imagery, along with ground-truth 2D polygons in MS-COCO format. The authors benchmark three state-of-the-art vectorization methods (FFL, HiSup, Pix2Poly) in image-only, LiDAR-only, and multimodal fusion settings, and introduce comprehensive metrics including POLIS, HD, CD, and MTA beyond IoU-based measures. The study demonstrates that LiDAR improves polygon prediction, that fusing image and LiDAR yields the best results, and that the dataset is sufficiently challenging to motivate multimodal approaches; the data and pretrained models are publicly available for broader evaluation and reuse. The work highlights practical implications for scalable, accurate cadastral mapping and points to future work toward broader geographic coverage and richer annotations.

Abstract

We present the P

dataset, a large-scale multimodal benchmark for building vectorization, constructed from aerial LiDAR point clouds, high-resolution aerial imagery, and vectorized 2D building outlines, collected across three continents. The dataset contains over 10 billion LiDAR points with decimeter-level accuracy and RGB images at a ground sampling distance of 25 centimeter. While many existing datasets primarily focus on the image modality, P

offers a complementary perspective by also incorporating dense 3D information. We demonstrate that LiDAR point clouds serve as a robust modality for predicting building polygons, both in hybrid and end-to-end learning frameworks. Moreover, fusing aerial LiDAR and imagery further improves accuracy and geometric quality of predicted polygons. The P

dataset is publicly available, along with code and pretrained weights of three state-of-the-art models for building polygon prediction at https://github.com/raphaelsulzer/PixelsPointsPolygons .

The P$^3$ dataset: Pixels, Points and Polygons for Multimodal Building Vectorization

TL;DR

Abstract

The P$^3$ dataset: Pixels, Points and Polygons for Multimodal Building Vectorization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)