Table of Contents
Fetching ...

Contrasting Low and High-Resolution Features for HER2 Scoring using Deep Learning

Ekansh Chauhan, Anila Sharma, Amit Sharma, Vikas Nishadham, Asha Ghughtyal, Ankur Kumar, Gurudutt Gupta, Anurag Mehta, C. V. Jawahar, P. K. Vinod

TL;DR

Addressing inter-observer variability in HER2 IHC scoring and data scarcity, the study introduces the IPD-Breast dataset and evaluates three modeling paradigms for HER2 3-way classification: MIL-based patch aggregation, end-to-end slide-level ConvNeXt, and patch-level classification pipelines. The end-to-end ConvNeXt on low-resolution slides delivers the strongest 3-way performance (AUC 91.79, F1 83.52, accuracy 83.56), with moderate-resolution patch-level methods offering competitive AUC around 94 for four-way tasks, highlighting a trade-off between granularity and efficiency. Importantly, increasing resolution did not reliably boost performance, suggesting that aggregate context and patch-level decision fusion are key to accurate slide-level HER2 scoring. The findings support integrating AI-driven HER2 scoring into clinical workflows, with plans for hospital validation via an API and consideration of cross-scanner variability and labeling ambiguity.

Abstract

Breast cancer, the most common malignancy among women, requires precise detection and classification for effective treatment. Immunohistochemistry (IHC) biomarkers like HER2, ER, and PR are critical for identifying breast cancer subtypes. However, traditional IHC classification relies on pathologists' expertise, making it labor-intensive and subject to significant inter-observer variability. To address these challenges, this study introduces the India Pathology Breast Cancer Dataset (IPD-Breast), comprising of 1,272 IHC slides (HER2, ER, and PR) aimed at automating receptor status classification. The primary focus is on developing predictive models for HER2 3-way classification (0, Low, High) to enhance prognosis. Evaluation of multiple deep learning models revealed that an end-to-end ConvNeXt network utilizing low-resolution IHC images achieved an AUC, F1, and accuracy of 91.79%, 83.52%, and 83.56%, respectively, for 3-way classification, outperforming patch-based methods by over 5.35% in F1 score. This study highlights the potential of simple yet effective deep learning techniques to significantly improve accuracy and reproducibility in breast cancer classification, supporting their integration into clinical workflows for better patient outcomes.

Contrasting Low and High-Resolution Features for HER2 Scoring using Deep Learning

TL;DR

Addressing inter-observer variability in HER2 IHC scoring and data scarcity, the study introduces the IPD-Breast dataset and evaluates three modeling paradigms for HER2 3-way classification: MIL-based patch aggregation, end-to-end slide-level ConvNeXt, and patch-level classification pipelines. The end-to-end ConvNeXt on low-resolution slides delivers the strongest 3-way performance (AUC 91.79, F1 83.52, accuracy 83.56), with moderate-resolution patch-level methods offering competitive AUC around 94 for four-way tasks, highlighting a trade-off between granularity and efficiency. Importantly, increasing resolution did not reliably boost performance, suggesting that aggregate context and patch-level decision fusion are key to accurate slide-level HER2 scoring. The findings support integrating AI-driven HER2 scoring into clinical workflows, with plans for hospital validation via an API and consideration of cross-scanner variability and labeling ambiguity.

Abstract

Breast cancer, the most common malignancy among women, requires precise detection and classification for effective treatment. Immunohistochemistry (IHC) biomarkers like HER2, ER, and PR are critical for identifying breast cancer subtypes. However, traditional IHC classification relies on pathologists' expertise, making it labor-intensive and subject to significant inter-observer variability. To address these challenges, this study introduces the India Pathology Breast Cancer Dataset (IPD-Breast), comprising of 1,272 IHC slides (HER2, ER, and PR) aimed at automating receptor status classification. The primary focus is on developing predictive models for HER2 3-way classification (0, Low, High) to enhance prognosis. Evaluation of multiple deep learning models revealed that an end-to-end ConvNeXt network utilizing low-resolution IHC images achieved an AUC, F1, and accuracy of 91.79%, 83.52%, and 83.56%, respectively, for 3-way classification, outperforming patch-based methods by over 5.35% in F1 score. This study highlights the potential of simple yet effective deep learning techniques to significantly improve accuracy and reproducibility in breast cancer classification, supporting their integration into clinical workflows for better patient outcomes.

Paper Structure

This paper contains 10 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Comparison of traditional manual inspection workflow by pathologists versus a deep learning-based web application for classifying HER2, ER, and PR statuses in breast cancer tissue samples.
  • Figure 2: Few misclassified samples in Approach 2. [GT- Ground Truth, Pred- Predicted class]
  • Figure 3: Few predicted samples at patch level using Approach 3. Top row shows correctly predicted samples whereas bottom row shows the misclassified samples. [GT- Ground Truth, Pred- Predicted class]
  • Figure 4: Comparison of F1 scores for different approaches. (a) Overall performance across tasks. (b) Class-wise performance in the 3-Way classification task.
  • Figure 5: Comparison of DTFD and ConvNext-S approaches for (a) ER and (b) PR binary classification
  • ...and 1 more figures