Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong, Junjun Jiang, Kui Jiang, Jiahan Li, Yongbing Zhang
TL;DR
This work tackles the high inference cost of multi-instance learning on gigapixel whole-slide images by introducing HDMIL, a hierarchical distillation MIL framework. It combines a dynamic multi-instance network (DMIN) operating on high-resolution WSIs to generate instance relevance masks and a lightweight instance pre-screening network (LIPN) operating on low-resolution WSIs to predict patch relevance, enabling efficient inference with minimal performance loss. A Chebyshev-polynomials-based Kolmogorov-Arnold (CKA) classifier enhances the aggregation of bag representations. Across Camelyon16, TCGA-NSCLC, and TCGA-BRCA, HDMIL surpasses state-of-the-art MIL methods in AUC and accuracy while substantially reducing inference time (e.g., up to 28.6% on Camelyon16). These results demonstrate a practical path to fast and accurate WSI classification by discarding irrelevant patches in a principled, distillation-driven manner.
Abstract
Although multi-instance learning (MIL) has succeeded in pathological image classification, it faces the challenge of high inference costs due to processing numerous patches from gigapixel whole slide images (WSIs). To address this, we propose HDMIL, a hierarchical distillation multi-instance learning framework that achieves fast and accurate classification by eliminating irrelevant patches. HDMIL consists of two key components: the dynamic multi-instance network (DMIN) and the lightweight instance pre-screening network (LIPN). DMIN operates on high-resolution WSIs, while LIPN operates on the corresponding low-resolution counterparts. During training, DMIN are trained for WSI classification while generating attention-score-based masks that indicate irrelevant patches. These masks then guide the training of LIPN to predict the relevance of each low-resolution patch. During testing, LIPN first determines the useful regions within low-resolution WSIs, which indirectly enables us to eliminate irrelevant regions in high-resolution WSIs, thereby reducing inference time without causing performance degradation. In addition, we further design the first Chebyshev-polynomials-based Kolmogorov-Arnold classifier in computational pathology, which enhances the performance of HDMIL through learnable activation layers. Extensive experiments on three public datasets demonstrate that HDMIL outperforms previous state-of-the-art methods, e.g., achieving improvements of 3.13% in AUC while reducing inference time by 28.6% on the Camelyon16 dataset.
