Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation

Lars Schmarje; Vasco Grossmann; Claudius Zelenka; Johannes Brünger; Reinhard Koch

Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation

Lars Schmarje, Vasco Grossmann, Claudius Zelenka, Johannes Brünger, Reinhard Koch

TL;DR

This work tackles the challenge of obtaining high-quality labels from ambiguous real-world data by proposing a flowchart-guided annotation strategy and validating it on a biomedical vertebral fracture task. It synthesizes literature into a practical five-step workflow (What, Who, How, Annotation Process, Post-Processing) and leverages soft labels, DC3 proposals, and CleverLabel post-processing to reduce annotation bias and improve data quality. The vertebral height-reduction dataset VerSe was annotated with over 250,000 annotations, enabling a data-centric evaluation showing improved label distributions and downstream classifier performance. Overall, the work demonstrates how carefully designed annotation processes can yield high-quality datasets that drive better generalization in biomedical image classification.

Abstract

In the field of image classification, existing methods often struggle with biased or ambiguous data, a prevalent issue in real-world scenarios. Current strategies, including semi-supervised learning and class blending, offer partial solutions but lack a definitive resolution. Addressing this gap, our paper introduces a novel strategy for generating high-quality labels in challenging datasets. Central to our approach is a clearly designed flowchart, based on a broad literature review, which enables the creation of reliable labels. We validate our methodology through a rigorous real-world test case in the biomedical field, specifically in deducing height reduction from vertebral imaging. Our empirical study, leveraging over 250,000 annotations, demonstrates the effectiveness of our strategies decisions compared to their alternatives.

Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation

TL;DR

Abstract

Paper Structure (27 sections, 2 equations, 8 figures, 4 tables)

This paper contains 27 sections, 2 equations, 8 figures, 4 tables.

Introduction
Practical Guidelines: How to Annotate Ambiguous Data?
Evaluation
Applying the Strategy
Analysis
Limitations and Future Work
Conclusion
Reproducibility Statement
Full Explanation of Flowchart
Definition - What?
Definition - Who?
Definition - How?
Annotation Process
Post-Processing
Implementation and approximation details
...and 12 more sections

Figures (8)

Figure 1: Illustration of the concept of hard and soft labels and how they can be created from annotations -- The recommended process has three steps. In the first step, an image $x$ is selected for annotation. The unknown ground-truth distribution ($P(\hat{L}^x = \cdot)$) could either be soft or hard as shown by the examples in the lower half. During the annotation, multiple annotations are created either with or without proposal. A proposal means that one class is recommended during the annotation process. In the example, class B is proposed and 32 annotations are generated. The average across these annotations could already be used as an approximation of the soft-label $P(L^x = \cdot)$, however it might be biased towards the proposal since it is more likely to accept a proposal schmarje2023spa. Our post-processing step enhances the approximated distribution ($P(L^x = \cdot)$)from the second step by reducing this bias. In the provided example, the probability of class B is reduced since it was most likely overestimated due to the used proposal of class B.
Figure 2: Flowchart with guidelines on how to annotate ambiguous data, best viewed in color.
Figure 3: Illustration of a spine and definition of height reduction classes
Figure 4: Comparison of dataset-specific variables with previously reported values schmarje2023spa
Figure 5: Annotation time in days based on expected consensus ratio $p_c$ and annotations per hour $\frac{a}{h}$, reported values from dc3 and with 95% agreeing votes threshold $\hat{p_c}$
...and 3 more figures

Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation

TL;DR

Abstract

Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)