Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification
Xuenian Wang, Shanshan Shi, Renao Yan, Qiehe Sun, Lianghui Zhu, Tian Guan, Yonghong He
TL;DR
The paper tackles the challenge of updating patch embeddings in MIL-based WSI classification by introducing HC-FT, a heuristic clustering-driven feature fine-tuning framework that purifies positives and mines hard negatives through pseudo-label refinement. By performing two rounds of clustering-based pseudo-label refinement, HC-FT produces cleaner training signals for encoder fine-tuning, leading to more discriminative embeddings across various MIL backbones. On CAMELYON16 and BRACS, HC-FT achieves state-of-the-art bag-level AUCs of $97.13\%$ and $85.85\%$, respectively, along with strong patch-level metrics, demonstrating improved robustness to noisy labels and better localization of tumor regions. This approach enhances the practical impact of MIL in computational pathology by delivering more reliable, task-focused feature representations with broad compatibility across MIL models.
Abstract
In the field of whole slide image (WSI) classification, multiple instance learning (MIL) serves as a promising approach, commonly decoupled into feature extraction and aggregation. In this paradigm, our observation reveals that discriminative embeddings are crucial for aggregation to the final prediction. Among all feature updating strategies, task-oriented ones can capture characteristics specifically for certain tasks. However, they can be prone to overfitting and contaminated by samples assigned with noisy labels. To address this issue, we propose a heuristic clustering-driven feature fine-tuning method (HC-FT) to enhance the performance of multiple instance learning by providing purified positive and hard negative samples. Our method first employs a well-trained MIL model to evaluate the confidence of patches. Then, patches with high confidence are marked as positive samples, while the remaining patches are used to identify crucial negative samples. After two rounds of heuristic clustering and selection, purified positive and hard negative samples are obtained to facilitate feature fine-tuning. The proposed method is evaluated on both CAMELYON16 and BRACS datasets, achieving an AUC of 97.13% and 85.85%, respectively, consistently outperforming all compared methods.
