Table of Contents
Fetching ...

FLAIR-HUB: Large-scale Multimodal Dataset for Land Cover and Crop Mapping

Anatol Garioud, Sébastien Giordano, Nicolas David, Nicolas Gonthier

Abstract

The growing availability of high-quality Earth Observation (EO) data enables accurate global land cover and crop type monitoring. However, the volume and heterogeneity of these datasets pose major processing and annotation challenges. To address this, the French National Institute of Geographical and Forest Information (IGN) is actively exploring innovative strategies to exploit diverse EO data, which require large annotated datasets. IGN introduces FLAIR-HUB, the largest multi-sensor land cover dataset with very-high-resolution (20 cm) annotations, covering 2528 km2 of France. It combines six aligned modalities: aerial imagery, Sentinel-1/2 time series, SPOT imagery, topographic data, and historical aerial images. Extensive benchmarks evaluate multimodal fusion and deep learning models (CNNs, transformers) for land cover or crop mapping and also explore multi-task learning. Results underscore the complexity of multimodal fusion and fine-grained classification, with best land cover performance (78.2% accuracy, 65.8% mIoU) achieved using nearly all modalities. FLAIR-HUB supports supervised and multimodal pretraining, with data and code available at https://ignf.github.io/FLAIR/flairhub.

FLAIR-HUB: Large-scale Multimodal Dataset for Land Cover and Crop Mapping

Abstract

The growing availability of high-quality Earth Observation (EO) data enables accurate global land cover and crop type monitoring. However, the volume and heterogeneity of these datasets pose major processing and annotation challenges. To address this, the French National Institute of Geographical and Forest Information (IGN) is actively exploring innovative strategies to exploit diverse EO data, which require large annotated datasets. IGN introduces FLAIR-HUB, the largest multi-sensor land cover dataset with very-high-resolution (20 cm) annotations, covering 2528 km2 of France. It combines six aligned modalities: aerial imagery, Sentinel-1/2 time series, SPOT imagery, topographic data, and historical aerial images. Extensive benchmarks evaluate multimodal fusion and deep learning models (CNNs, transformers) for land cover or crop mapping and also explore multi-task learning. Results underscore the complexity of multimodal fusion and fine-grained classification, with best land cover performance (78.2% accuracy, 65.8% mIoU) achieved using nearly all modalities. FLAIR-HUB supports supervised and multimodal pretraining, with data and code available at https://ignf.github.io/FLAIR/flairhub.

Paper Structure

This paper contains 22 sections, 3 equations, 9 figures, 18 tables.

Figures (9)

  • Figure 1: Temporal Distribution of Mono-temporal Modality Acquisitions. From top left to bottom right, we provide the information about the month of acquisition for the Aerial VHR images, historical images, SPOT satellite images, and finally, the year of acquisition for the historical data. For each domain, the size of the circle is proportional to the amount of data for that month or year.
  • Figure 2: Spatio-temporal distribution of multi-temporal modality acquisitions. We plot the number of acquisitions per area for the different STIS; areas are buffered by 5 km for visualization purposes. The acquisition orbits can be distinguished.
  • Figure 3: Example patches from the dataset, illustrating all available modalities. We only plot one image per satellite time series. ASC stands for ascendant and DESC for descendant.
  • Figure 4: Spatial distribution of splits in a k-fold configuration. Split 1 corresponds to the official FLAIR-HUB split (also named split_flairhub).
  • Figure 5: Architecture of the baseline UPerFuse model designed for multimodal fusion and multi-task semantic segmentation. The transparent modules correspond to auxiliary loss branches.
  • ...and 4 more figures