Noisy Annotations in Semantic Segmentation

Moshe Kimhi; Omer Kerem; Eden Grad; Ehud Rivlin; Chaim Baskin

Noisy Annotations in Semantic Segmentation

Moshe Kimhi, Omer Kerem, Eden Grad, Ehud Rivlin, Chaim Baskin

TL;DR

This work systematically investigates noisy annotations in semantic/instance segmentation by introducing synthetic (VIPER-N) and real-world (COCO-N, CityScapes-N) benchmarks, plus a weakly-annotated tool (COCO-WAN) to simulate noisy labels from foundation-model prompts. It defines five noise types, analyzes model robustness across architectures (including Transformer-based backbones), and shows substantial degradation in mask quality, boundary accuracy, and confidence under noise. The study also links segmentation noise to clinical risks via the CAMUS EF metric, and provides qualitative analyses, ablations, and learning-with-noisy-label explorations, highlighting the need for noise-aware training, improved annotation pipelines, and robust architectures. Overall, the results underscore the gap between current LNL methods (primarily for classification) and the demands of spatially precise segmentation, motivating a toolkit (Benchmark-N) and future directions for resilient semantic segmentation. The work concludes with practical recommendations and releases to enable reproducible evaluation of noisy-label robustness in real-world segmentation tasks.

Abstract

Obtaining accurate labels for instance segmentation is particularly challenging due to the complex nature of the task. Each image necessitates multiple annotations, encompassing not only the object class but also its precise spatial boundaries. These requirements elevate the likelihood of errors and inconsistencies in both manual and automated annotation processes. By simulating different noise conditions, we provide a realistic scenario for assessing the robustness and generalization capabilities of instance segmentation models in different segmentation tasks, introducing COCO-N and Cityscapes-N. We also propose a benchmark for weakly annotation noise, dubbed COCO-WAN, which utilizes foundation models and weak annotations to simulate semi-automated annotation tools and their noisy labels. This study sheds light on the quality of segmentation masks produced by various models and challenges the efficacy of popular methods designed to address learning with label noise.

Noisy Annotations in Semantic Segmentation

TL;DR

Abstract

Paper Structure (32 sections, 1 equation, 16 figures, 14 tables)

This paper contains 32 sections, 1 equation, 16 figures, 14 tables.

Introduction
Annotations Noise Definition
Observed Noise Patterns and Motivation
Noise Formulation
Noise Definitions
Synthetic Dataset: VIPER
Experimental Results
Noisy Benchmarks on Real World Data
Results Across Popular Models.
Implications.
Weakly Annotations Noise
Evaluations and Qualitative Analysis
Qualitatively Analysis
Confidence and Loss Analysis
Discussion
...and 17 more sections

Figures (16)

Figure 1: Representative examples of annotation noise found in both manually labeled data (e.g., COCO lin2014microsoft) and weakly annotated data (e.g., OpenImages OpenImages). These errors include incomplete or over-extended masks, and ambiguous boundaries, underscoring the pervasive challenge of noisy labels in real-world segmentation tasks.
Figure 2: Illustrating the effects of the spatial noises with varying intensities.
Figure 3: Performance evaluation of Mask-RCNN on COCO, CityScapes and LVIS using 3 levels of annotation noises.
Figure 4: Examples from VIPER-N benchmark. Top row shows the clean annotations, second row the low noise regime, third present the midum annotation noise and last row the high annotation noise.
Figure 5: Two masks created by SAM kirillov2023segany, while point is a weaker annotation prompt, the box contain noise, thus produce poor mask annotation.
...and 11 more figures

Noisy Annotations in Semantic Segmentation

TL;DR

Abstract

Noisy Annotations in Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (16)