Toulouse Hyperspectral Data Set: a benchmark data set to assess semi-supervised spectral representation learning and pixel-wise classification techniques

Romain Thoreau; Laurent Risser; Véronique Achard; Béatrice Berthelot; Xavier Briottet

Toulouse Hyperspectral Data Set: a benchmark data set to assess semi-supervised spectral representation learning and pixel-wise classification techniques

Romain Thoreau, Laurent Risser, Véronique Achard, Béatrice Berthelot, Xavier Briottet

TL;DR

The Toulouse Hyperspectral Data Set is released that stands out from other data sets in the above-mentioned respects in order to meet key issues in spectral representation learning and classification over large-scale hyperspectral images with very few labeled pixels.

Abstract

Airborne hyperspectral images can be used to map the land cover in large urban areas, thanks to their very high spatial and spectral resolutions on a wide spectral domain. While the spectral dimension of hyperspectral images is highly informative of the chemical composition of the land surface, the use of state-of-the-art machine learning algorithms to map the land cover has been dramatically limited by the availability of training data. To cope with the scarcity of annotations, semi-supervised and self-supervised techniques have lately raised a lot of interest in the community. Yet, the publicly available hyperspectral data sets commonly used to benchmark machine learning models are not totally suited to evaluate their generalization performances due to one or several of the following properties: a limited geographical coverage (which does not reflect the spectral diversity in metropolitan areas), a small number of land cover classes and a lack of appropriate standard train / test splits for semi-supervised and self-supervised learning. Therefore, we release in this paper the Toulouse Hyperspectral Data Set that stands out from other data sets in the above-mentioned respects in order to meet key issues in spectral representation learning and classification over large-scale hyperspectral images with very few labeled pixels. Besides, we discuss and experiment self-supervised techniques for spectral representation learning, including the Masked Autoencoder, and establish a baseline for pixel-wise classification achieving 85% overall accuracy and 77% F1 score. The Toulouse Hyperspectral Data Set and our code are publicly available at https://www.toulouse-hyperspectral-data-set.com and https://www.github.com/Romain3Ch216/tlse-experiments, respectively.

Toulouse Hyperspectral Data Set: a benchmark data set to assess semi-supervised spectral representation learning and pixel-wise classification techniques

TL;DR

Abstract

Paper Structure (13 sections, 1 equation, 16 figures, 4 tables)

This paper contains 13 sections, 1 equation, 16 figures, 4 tables.

Introduction
Construction and properties of the Toulouse Hyperspectral Data Set
Land cover ground truth
Standard training and test sets for semi-supervised learning
Python package
Comparison with publicly available data sets
Self-supervision for spectral representation learning
Self-supervised learning: an overview
Experiments on the Toulouse Hyperspectral Data Set
Experimental protocol
Experimental results
Ablation study
Conclusions and perspectives

Figures (16)

Figure 1: Area of Toulouse covered by the AI4GEO airborne hyperspectral image (in blue), our annotated ground truth (in red), and examples of reflectance spectra (clear paving stone, brown paving stone and red porous concrete, from top to bottom) measured on field with ASD spectrometers during the CAMCATT-AI4GEO field campaign ROUPIOZ2023109109.
Figure 2: Land cover nomenclature of Toulouse Hyperspectral Data Set
Figure 3: Random spectra of the classes orange tile and synthetic track.
Figure 4: Minimal example of Python code to load data in Pytorch loaders with the https://github.com/Romain3Ch216/TlseHypDataSet library
Figure 5: Illustration of our hand-crafted patch-wise feature extraction technique. The input is a 64 by 64 pixel hyperspectral patch. On one side, spectral indices (which include a selection of 20 spectral bands uniformly sampled along the spectral domain) are computed, resulting in 26 maps of 64 by 64 pixels. On the other side, the patch, averaged along the spectral dimension, is filtered by Gabor filters with 4 different frequencies (from 1 m$^{-1}$ to 10 m$^{-1}$) and 6 different orientations, resulting in 24 maps. From every maps, spatial statistics are computed, resulting in a 400-dimensional feature.
...and 11 more figures

Toulouse Hyperspectral Data Set: a benchmark data set to assess semi-supervised spectral representation learning and pixel-wise classification techniques

TL;DR

Abstract

Toulouse Hyperspectral Data Set: a benchmark data set to assess semi-supervised spectral representation learning and pixel-wise classification techniques

Authors

TL;DR

Abstract

Table of Contents

Figures (16)