Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

Xuenian Wang; Shanshan Shi; Renao Yan; Qiehe Sun; Lianghui Zhu; Tian Guan; Yonghong He

Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

Xuenian Wang, Shanshan Shi, Renao Yan, Qiehe Sun, Lianghui Zhu, Tian Guan, Yonghong He

TL;DR

The paper tackles the challenge of updating patch embeddings in MIL-based WSI classification by introducing HC-FT, a heuristic clustering-driven feature fine-tuning framework that purifies positives and mines hard negatives through pseudo-label refinement. By performing two rounds of clustering-based pseudo-label refinement, HC-FT produces cleaner training signals for encoder fine-tuning, leading to more discriminative embeddings across various MIL backbones. On CAMELYON16 and BRACS, HC-FT achieves state-of-the-art bag-level AUCs of $97.13\%$ and $85.85\%$, respectively, along with strong patch-level metrics, demonstrating improved robustness to noisy labels and better localization of tumor regions. This approach enhances the practical impact of MIL in computational pathology by delivering more reliable, task-focused feature representations with broad compatibility across MIL models.

Abstract

In the field of whole slide image (WSI) classification, multiple instance learning (MIL) serves as a promising approach, commonly decoupled into feature extraction and aggregation. In this paradigm, our observation reveals that discriminative embeddings are crucial for aggregation to the final prediction. Among all feature updating strategies, task-oriented ones can capture characteristics specifically for certain tasks. However, they can be prone to overfitting and contaminated by samples assigned with noisy labels. To address this issue, we propose a heuristic clustering-driven feature fine-tuning method (HC-FT) to enhance the performance of multiple instance learning by providing purified positive and hard negative samples. Our method first employs a well-trained MIL model to evaluate the confidence of patches. Then, patches with high confidence are marked as positive samples, while the remaining patches are used to identify crucial negative samples. After two rounds of heuristic clustering and selection, purified positive and hard negative samples are obtained to facilitate feature fine-tuning. The proposed method is evaluated on both CAMELYON16 and BRACS datasets, achieving an AUC of 97.13% and 85.85%, respectively, consistently outperforming all compared methods.

Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

TL;DR

and

, respectively, along with strong patch-level metrics, demonstrating improved robustness to noisy labels and better localization of tumor regions. This approach enhances the practical impact of MIL in computational pathology by delivering more reliable, task-focused feature representations with broad compatibility across MIL models.

Abstract

Paper Structure (22 sections, 18 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 18 equations, 6 figures, 4 tables, 2 algorithms.

Introduction
Related works
Multiple instance learning for WSI classification
Self-supervised learning
End-to-end training and fine-tuning
Method
Problem statement
Heuristic clustering strategy
Pseudo label initialization
Potential negative sample mining
Pseudo label refinement
Task-oriented feature fine-tuning
Experiment
Datasets and metrics
Implementation details
...and 7 more sections

Figures (6)

Figure 1: AUC performance of various MIL models based on different pre-trained weights on the CAMELYON16 test dataset.
Figure 2: Overview of the proposed heuristic clustering-driven feature fine-tuning method. After slide preprocessing and feature extraction, we can obtain a well-trained MIL model by weakly supervised learning. Then, we freeze the MIL aggregator to get class-wise confidence scores, which split all patches into high and low confidence sets. Patches in different sets are assigned with different pseudo labels. We use the first heuristic clustering for potential negative sample mining and the second heuristic clustering for positive sample cleaning and hard negative sample searching. With this pseudo label refinement, a patch-level dataset with pure positive samples and hard negative samples is constructed for feature encoder fine-tuning.
Figure 3: Validation and test performance variations across different training iterations on CAMELYON16 and BRACS datasets.
Figure 4: Free-response receiver operating characteristic curves with various MIL methods according to patch-level CAMELYON16 annotations.
Figure 5: Heatmap of our method in the CAMELYON16 dataset. 'test$\_$001' and 'test$\_$010' refer to macro-metastasis and micro-metastasis WSIs. The green lines in the first column indicate the annotated regions of WSIs. The second column presents heatmaps generated by attention scores. The third and fourth columns display sections of the annotated area and the corresponding heatmaps. Typical patches (a)-(f) in the final column are selected for further analysis.
...and 1 more figures

Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

TL;DR

Abstract

Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)