Histologic Dataset of Normal and Atypical Mitotic Figures on Human Breast Cancer (AMi-Br)
Christof A. Bertram, Viktoria Weiss, Taryn A. Donovan, Sweta Banerjee, Thomas Conrad, Jonas Ammeling, Robert Klopfleisch, Christopher Kaltenecker, Marc Aubreville
TL;DR
AMi-Br is the first public histologic dataset of atypical mitotic figures (AMFs) in human breast cancer, enabling study of AMF’s prognostic relevance. It compiles 3,720 MF instances from TUPAC and MIDOG, labeled by three experts via majority vote to distinguish 832 AMFs from 2,888 normal MFs. Baseline deep-learning experiments using DenseNet-121 and EfficientNet V2-S show patch-level balanced accuracy up to 0.806 and ROC AUC up to 0.899, with patient-level accuracy around 0.713, highlighting the challenge posed by class imbalance and morphologic overlap. The dataset and baselines provide a resource for AMF-focused prognostic modeling and motivate cross-tumor AMF research and whole-slide approaches.
Abstract
Assessment of the density of mitotic figures (MFs) in histologic tumor sections is an important prognostic marker for many tumor types, including breast cancer. Recently, it has been reported in multiple works that the quantity of MFs with an atypical morphology (atypical MFs, AMFs) might be an independent prognostic criterion for breast cancer. AMFs are an indicator of mutations in the genes regulating the cell cycle and can lead to aberrant chromosome constitution (aneuploidy) of the tumor cells. To facilitate further research on this topic using pattern recognition, we present the first ever publicly available dataset of atypical and normal MFs (AMi-Br). For this, we utilized two of the most popular MF datasets (MIDOG 2021 and TUPAC) and subclassified all MFs using a three expert majority vote. Our final dataset consists of 3,720 MFs, split into 832 AMFs (22.4%) and 2,888 normal MFs (77.6%) across all 223 tumor cases in the combined set. We provide baseline classification experiments to investigate the consistency of the dataset, using a Monte Carlo cross-validation and different strategies to combat class imbalance. We found an averaged balanced accuracy of up to 0.806 when using a patch-level data set split, and up to 0.713 when using a patient-level split.
