Multiple Instance Learning for Glioma Diagnosis using Hematoxylin and Eosin Whole Slide Images: An Indian Cohort Study
Ekansh Chauhan, Amit Sharma, Megha S Uppin, C. V. Jawahar, P. K. Vinod
TL;DR
The paper tackles accurate glioma typing, grading, and IHC biomarker inference from H&E whole-slide images by formulating a MIL-based pipeline. It introduces the IPD-Brain Indian cohort and systematically evaluates combinations of patch-level feature extractors and MIL aggregators, finding that a ResNet-50 backbone pre-trained with Barrow Twins SSL and the Double-Tier Feature Distillation (DTFD) aggregator yields state-of-the-art performance on IPD-Brain and TCGA-Brain datasets. The approach demonstrates high AUCs for multi-class glioma subtype classification, reliable grading performance, and strong ability to predict IHC biomarkers (IDH, ATRX, TP53) and Ki-67 from H&E, with explainability via attention maps aligning with pathologist diagnostic reasoning. Importantly, the model operates on H&E slides alone, offering a cost-effective augmentation to molecular testing and potential applicability across diverse patient populations, highlighted by the newly established IPD-Brain resource. The work paves the way for broader deployment of MIL-based histopathology tools in neuro-oncology and encourages further exploration of extractor-aggregator pairings.
Abstract
The effective management of brain tumors relies on precise typing, subtyping, and grading. This study advances patient care with findings from rigorous multiple instance learning experimentations across various feature extractors and aggregators in brain tumor histopathology. It establishes new performance benchmarks in glioma subtype classification across multiple datasets, including a novel dataset focused on the Indian demographic (IPD- Brain), providing a valuable resource for existing research. Using a ResNet-50, pretrained on histopathology datasets for feature extraction, combined with the Double-Tier Feature Distillation (DTFD) feature aggregator, our approach achieves state-of-the-art AUCs of 88.08 on IPD-Brain and 95.81 on the TCGA-Brain dataset, respectively, for three-way glioma subtype classification. Moreover, it establishes new benchmarks in grading and detecting IHC molecular biomarkers (IDH1R132H, TP53, ATRX, Ki-67) through H&E stained whole slide images for the IPD-Brain dataset. The work also highlights a significant correlation between the model decision-making processes and the diagnostic reasoning of pathologists, underscoring its capability to mimic professional diagnostic procedures.
