Table of Contents
Fetching ...

Data-Centric Benchmark for Label Noise Estimation and Ranking in Remote Sensing Image Segmentation

Keiller Nogueira, Codrut-Andrei Diaconu, Dávid Kerekes, Jakob Gawlikowski, Cédric Léonard, Nassim Ait Ali Braham, June Moh Goo, Zichao Zeng, Zhipeng Liu, Pallavi Jain, Andrea Nascetti, Ronny Hänsch

TL;DR

A novel Data-Centric benchmark is introduced, together with a novel, publicly available dataset and two techniques for identifying, quantifying, and ranking training samples according to their level of label noise in remote sensing semantic segmentation are proposed.

Abstract

High-quality pixel-level annotations are essential for the semantic segmentation of remote sensing imagery. However, such labels are expensive to obtain and often affected by noise due to the labor-intensive and time-consuming nature of pixel-wise annotation, which makes it challenging for human annotators to label every pixel accurately. Annotation errors can significantly degrade the performance and robustness of modern segmentation models, motivating the need for reliable mechanisms to identify and quantify noisy training samples. This paper introduces a novel Data-Centric benchmark, together with a novel, publicly available dataset and two techniques for identifying, quantifying, and ranking training samples according to their level of label noise in remote sensing semantic segmentation. Such proposed methods leverage complementary strategies based on model uncertainty, prediction consistency, and representation analysis, and consistently outperform established baselines across a range of experimental settings. The outcomes of this work are publicly available at https://github.com/keillernogueira/label_noise_segmentation.

Data-Centric Benchmark for Label Noise Estimation and Ranking in Remote Sensing Image Segmentation

TL;DR

A novel Data-Centric benchmark is introduced, together with a novel, publicly available dataset and two techniques for identifying, quantifying, and ranking training samples according to their level of label noise in remote sensing semantic segmentation are proposed.

Abstract

High-quality pixel-level annotations are essential for the semantic segmentation of remote sensing imagery. However, such labels are expensive to obtain and often affected by noise due to the labor-intensive and time-consuming nature of pixel-wise annotation, which makes it challenging for human annotators to label every pixel accurately. Annotation errors can significantly degrade the performance and robustness of modern segmentation models, motivating the need for reliable mechanisms to identify and quantify noisy training samples. This paper introduces a novel Data-Centric benchmark, together with a novel, publicly available dataset and two techniques for identifying, quantifying, and ranking training samples according to their level of label noise in remote sensing semantic segmentation. Such proposed methods leverage complementary strategies based on model uncertainty, prediction consistency, and representation analysis, and consistently outperform established baselines across a range of experimental settings. The outcomes of this work are publicly available at https://github.com/keillernogueira/label_noise_segmentation.
Paper Structure (18 sections, 3 equations, 3 figures, 3 tables)

This paper contains 18 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the proposed benchmark. A well-curated, clean dataset is first partitioned into training and validation/test sets. A noise injection algorithm is then applied to the training set to artificially introduce label noise. By comparing the noisy and clean annotations, a ground-truth ranking of training samples is generated based on pixel-wise Intersection-over-Union (IoU). The resulting noisy training set can be subsequently processed by a noise estimation method, which assigns each image a relative label-noise score, enabling a ranking from least to most affected by annotation errors. The predicted ranking can be evaluated against the ground-truth ranking using ranking-based metrics, such as Kendall's $\tau$gupta2019correlation. Finally, the estimated ranking can also be used to prioritize or select higher-quality samples under different data selection criteria or annotation budgets, enabling more robust model training and improved generalization, which may be assessed on the original clean validation/test sets.
  • Figure 2: Samples from the SpaceNet8 dataset hansch2022spacenet. The first row shows the RGB image; the second row presents the reference segmentation masks; and the third row shows the segmentation masks corrupted with synthetic noise. Each column shows a different type of noise. White pixels represent the building class whereas black pixels are the background.
  • Figure 3: Comparison, in terms of Kendall’s $\tau$, between the reference ranking and the average rank position assigned to each sample across the two proposed approaches, stratified by noise type.