Performance of Human Annotators in Object Detection and Segmentation of Remotely Sensed Data

Roni Blushtein-Livnon; Tal Svoray; Michael Dorman

Performance of Human Annotators in Object Detection and Segmentation of Remotely Sensed Data

Roni Blushtein-Livnon, Tal Svoray, Michael Dorman

TL;DR

The findings indicate that annotators generally perform more effectively in object detection (OD) than in segmentation tasks, and provide evidence that annotators are relatively cautious and tend to identify objects only when they are confident about them, prioritizing underestimation over overestimation.

Abstract

This study introduces a laboratory experiment designed to assess the influence of annotation strategies, levels of imbalanced data, and prior experience, on the performance of human annotators. The experiment focuses on labeling aerial imagery, using ArcGIS Pro tools, to detect and segment small-scale photovoltaic solar panels, selected as a case study for rectangular objects. The experiment is conducted using images with a pixel size of 0.15\textbf{$m$}, involving both expert and non-expert participants, across different setup strategies and target-background ratio datasets. Our findings indicate that human annotators generally perform more effectively in object detection than in segmentation tasks. A marked tendency to commit more Type II errors (False Negatives, i.e., undetected objects) than Type I errors (False Positives, i.e. falsely detecting objects that do not exist) was observed across all experimental setups and conditions, suggesting a consistent bias in detection and segmentation processes. Performance was better in tasks with higher target-background ratios (i.e., more objects per unit area). Prior experience did not significantly impact performance and may, in some cases, even lead to overestimation in segmentation. These results provide evidence that human annotators are relatively cautious and tend to identify objects only when they are confident about them, prioritizing underestimation over overestimation. Annotators' performance is also influenced by object scarcity, showing a decline in areas with extremely imbalanced datasets and a low ratio of target-to-background. These findings may enhance annotation strategies for remote sensing research while efficient human annotators are crucial in an era characterized by growing demands for high-quality training data to improve segmentation and detection models.

Performance of Human Annotators in Object Detection and Segmentation of Remotely Sensed Data

TL;DR

Abstract

}, involving both expert and non-expert participants, across different setup strategies and target-background ratio datasets. Our findings indicate that human annotators generally perform more effectively in object detection than in segmentation tasks. A marked tendency to commit more Type II errors (False Negatives, i.e., undetected objects) than Type I errors (False Positives, i.e. falsely detecting objects that do not exist) was observed across all experimental setups and conditions, suggesting a consistent bias in detection and segmentation processes. Performance was better in tasks with higher target-background ratios (i.e., more objects per unit area). Prior experience did not significantly impact performance and may, in some cases, even lead to overestimation in segmentation. These results provide evidence that human annotators are relatively cautious and tend to identify objects only when they are confident about them, prioritizing underestimation over overestimation. Annotators' performance is also influenced by object scarcity, showing a decline in areas with extremely imbalanced datasets and a low ratio of target-to-background. These findings may enhance annotation strategies for remote sensing research while efficient human annotators are crucial in an era characterized by growing demands for high-quality training data to improve segmentation and detection models.

Paper Structure (30 sections, 4 equations, 10 figures, 2 tables)

This paper contains 30 sections, 4 equations, 10 figures, 2 tables.

Introduction
Related work
Training Sets and Human Annotators in RS
Annotation Strategy
Task Conditions - Level of Imbalanced Data
Annotators Expertise
Methods
Participants
Experimental Setup
Annotation strategy
Task Conditions
Prior Experience
Experimental Analysis
Data
Performance Evaluation Metrics
...and 15 more sections

Figures (10)

Figure 1: Challenges of small-scale PVs detection from RS images: A - Low contrast of ground-based panels with their surroundings (on the left), compared with a high contrast of rooftop panels (on the right). B - Presence of adjacent objects near ground panels make them difficult to detect. C - Shading of the panel makes it difficult to distinguish it from the target. D - The target appears in varying RGB values, making it difficult to identify. E – resemblance to other objects: a small shade structure with a similar size and color to a solar panel. F - a striped tarpaulin sheet resembling a solar panel.
Figure 2: Experimental setup overview: A – Strategy setup: Individuals annotator (a1) versus groups of 3 annotators (a2). Within the groups: Independent Process (a2 on the left) - Each annotator creates an annotation separately. The final annotation is determined by majority vote. An object marked by at least 2 annotators will be included in the final annotation; Dependent Process (a2 on the right) - The first annotator passes the annotation to a reviewer who corrects it and passes the corrected product to a second reviewer, who finalizes the annotation. B – Dense-target task versus sparse-target task. Each task contains the same number of targets spread over different area sizes for varying target-background imbalance. C – Expert-weighted setup: assigning double weight to the expert annotator in the group compared to the non-expert annotators. This setup is compared to an unweighted setup (a2 left panel).
Figure 3: Confusion matrix and performance metrics in OD and segmentation: A - Components of confusion matrix for performance evaluations. On the left: Matrix components for OD. An object is defined as a True Positive if it has at least 60% overlap with ground truth. The matrix components quantify the number of annotated objects (panels) in each category. On the right: Matrix components for segmentation. The components represent the number of annotated pixels in each category. B – on the left: A confusion matrix with a tool to represent all combinations between ground truth and annotation in binary classification. On the right: Performance metrics derived from the confusion matrix. Precision measures the ratio of correctly identified panels to all annotated panels, representing the accuracy of positive annotations; Recall measures the ratio of correctly identified to all actual panels, representing the model's ability to identify all relevant instances.
Figure 4: Top row: Examples from the annotation task. Annotators were asked to identify and segment solar panels. Bottom row: Examples of annotations (red rectangle); A – an unannotated panel (FN object); B – wrong detection (FP object), where the annotated object is a sun-heated boiler; C – under-segmentation (FN pixels), with the panel not fully annotated; D – over-segmentation (FP pixels), where the annotation includes the shadow of the panel.
Figure 5: Comparison of evaluation metrics between OD and segmentation tasks. All evaluation metrics are higher in OD compared with segmentation, indicating superior performance in identification compared with accurate delineation. Note that differences between the tasks are more pronounced in precision.
...and 5 more figures

Performance of Human Annotators in Object Detection and Segmentation of Remotely Sensed Data

TL;DR

Abstract

Performance of Human Annotators in Object Detection and Segmentation of Remotely Sensed Data

Authors

TL;DR

Abstract

Table of Contents

Figures (10)