Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation
Lars Schmarje, Vasco Grossmann, Claudius Zelenka, Johannes Brünger, Reinhard Koch
TL;DR
This work tackles the challenge of obtaining high-quality labels from ambiguous real-world data by proposing a flowchart-guided annotation strategy and validating it on a biomedical vertebral fracture task. It synthesizes literature into a practical five-step workflow (What, Who, How, Annotation Process, Post-Processing) and leverages soft labels, DC3 proposals, and CleverLabel post-processing to reduce annotation bias and improve data quality. The vertebral height-reduction dataset VerSe was annotated with over 250,000 annotations, enabling a data-centric evaluation showing improved label distributions and downstream classifier performance. Overall, the work demonstrates how carefully designed annotation processes can yield high-quality datasets that drive better generalization in biomedical image classification.
Abstract
In the field of image classification, existing methods often struggle with biased or ambiguous data, a prevalent issue in real-world scenarios. Current strategies, including semi-supervised learning and class blending, offer partial solutions but lack a definitive resolution. Addressing this gap, our paper introduces a novel strategy for generating high-quality labels in challenging datasets. Central to our approach is a clearly designed flowchart, based on a broad literature review, which enables the creation of reliable labels. We validate our methodology through a rigorous real-world test case in the biomedical field, specifically in deducing height reduction from vertebral imaging. Our empirical study, leveraging over 250,000 annotations, demonstrates the effectiveness of our strategies decisions compared to their alternatives.
