An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

Jialong Huang; Gaojie Li; Shichao Kan; Jianfeng Liu; Yixiong Liang

An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

Jialong Huang, Gaojie Li, Shichao Kan, Jianfeng Liu, Yixiong Liang

TL;DR

The paper tackles cervical cytopathology WSI screening by removing the need for lesion-level annotations and leveraging only WSI-level labels. It introduces a two-stage framework: first, a mean-pooling based patch-filter selects top-$k$ high-risk patches from a frozen foundation model; second, a parameter-efficient fine-tuning (PEFT) step via contrastive learning with a linear adapter adapts the foundation model to cervical imagery. The resulting patch representations are fed into embedding-based MIL, achieving state-of-the-art results on the CSD dataset and strong performance on FNAC 2019, while significantly reducing training time and memory usage. This approach offers a scalable, detection-free pathway for WSI screening and can potentially extend to broader histopathology tasks, albeit with remaining challenges in interpretability and complexity.

Abstract

Current cervical cytopathology whole slide image (WSI) screening primarily relies on detection-based approaches, which are limited in performance due to the expense and time-consuming annotation process. Multiple Instance Learning (MIL), a weakly supervised approach that relies solely on bag-level labels, can effectively alleviate these challenges. Nonetheless, MIL commonly employs frozen pretrained models or self-supervised learning for feature extraction, which suffers from low efficacy or inefficiency. In this paper, we propose an efficient framework for cervical cytopathology WSI classification using only WSI-level labels through unsupervised and weakly supervised learning. Given the sparse and dispersed nature of abnormal cells within cytopathological WSIs, we propose a strategy that leverages the pretrained foundation model to filter the top$k$ high-risk patches. Subsequently, we suggest parameter-efficient fine-tuning (PEFT) of a large foundation model using contrastive learning on the filtered patches to enhance its representation ability for task-specific signals. By training only the added linear adapters, we enhance the learning of patch-level features with substantially reduced time and memory consumption. Experiments conducted on the CSD and FNAC 2019 datasets demonstrate that the proposed method enhances the performance of various MIL methods and achieves state-of-the-art (SOTA) performance. The code and trained models are publicly available at https://github.com/CVIU-CSU/TCT-InfoNCE.

An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

TL;DR

high-risk patches from a frozen foundation model; second, a parameter-efficient fine-tuning (PEFT) step via contrastive learning with a linear adapter adapts the foundation model to cervical imagery. The resulting patch representations are fed into embedding-based MIL, achieving state-of-the-art results on the CSD dataset and strong performance on FNAC 2019, while significantly reducing training time and memory usage. This approach offers a scalable, detection-free pathway for WSI screening and can potentially extend to broader histopathology tasks, albeit with remaining challenges in interpretability and complexity.

Abstract

high-risk patches. Subsequently, we suggest parameter-efficient fine-tuning (PEFT) of a large foundation model using contrastive learning on the filtered patches to enhance its representation ability for task-specific signals. By training only the added linear adapters, we enhance the learning of patch-level features with substantially reduced time and memory consumption. Experiments conducted on the CSD and FNAC 2019 datasets demonstrate that the proposed method enhances the performance of various MIL methods and achieves state-of-the-art (SOTA) performance. The code and trained models are publicly available at https://github.com/CVIU-CSU/TCT-InfoNCE.

Paper Structure (15 sections, 5 equations, 6 figures, 6 tables)

This paper contains 15 sections, 5 equations, 6 figures, 6 tables.

Introduction
Related work
Cervical WSI Classification
MIL in WSI Classification
Foundation Models
Methods
Background
Method Framework
MP-based Patch Filter
Contrastive Learning with Linear Adaptation
Experiment
Dataset and experiments settings
Comparisons with the SOTA methods
Ablation Experiments
Conclusion

Figures (6)

Figure 1: Three deep learning paradigms for automated diagnosis on cervical cytopathology WSI: (a) Detection-based cancer screening framework based on abnormal cell annotations; (b) MIL methods for cancer screening with patch-level features extracted from frozen ResNet model; (c) Our proposed cancer screening with large foundation model adapted to cervical cytopathology WSI dataset.
Figure 2: Framework of our method. (a) top$k$ high-risk patches are selected by using filter strategy and used to train the frozen image encoder with adapter by contrastive learning. All patches are sent to the feature extractor with an adapter to extract patch features, and the patch features from one WSI are fed to the MIL network to make the final prediction. (b) Our filter strategy is training an MP-based method and using patch scores to filter patches.
Figure 3: Visualization of the influence of image patches on the class token. The first column shows the original images, the second column highlights the lesion areas with boxes, and the third column displays the attention weights of the image patches concerning the class token.
Figure 4: The ROC curves of different methods. The first three curves represent results obtained using BiomedCLIP-SimCLR, BiomedCLIP, and ResNet50 image encoders to extract features, which are then integrated through ABMIL to obtain the final results. The last curve follows the approach described in cao2023detection for WSI classification.
Figure 5: Performance of features extracted by different filter strategies. The extracted features are aggregated by four different MIL methods.
...and 1 more figures

An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

TL;DR

Abstract

An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

Authors

TL;DR

Abstract

Table of Contents

Figures (6)