Histopathology image embedding based on foundation models features aggregation for patient treatment response prediction
Bilel Guetarni, Feryal Windal, Halim Benhabiles, Mahfoud Chaibi, Romain Dubois, Emmanuelle Leteurtre, Dominique Collard
TL;DR
This work addresses predicting Diffuse Large B-Cell Lymphoma treatment response from histopathology whole slide images. It proposes a patch-based representation built from multiple foundation models pre-trained on large histopathology data, with patches concatenated and aggregated via attention-based MIL to form a WSI embedding. In a dataset of 152 patients and 384 WSIs, foundation-model features outperform ImageNet baselines, with concatenation of CONCH and HIPT providing the strongest performance; ablations show concatenation outperforms patch-level attention and AB-MIL-style WSI aggregation beats alternatives. The results support the value of foundation-model representations for histopathology and suggest further exploration of model aggregation strategies to improve treatment-response prediction and clinical decision support.
Abstract
Predicting the response of a patient to a cancer treatment is of high interest. Nonetheless, this task is still challenging from a medical point of view due to the complexity of the interaction between the patient organism and the considered treatment. Recent works on foundation models pre-trained with self-supervised learning on large-scale unlabeled histopathology datasets have opened a new direction towards the development of new methods for cancer diagnosis related tasks. In this article, we propose a novel methodology for predicting Diffuse Large B-Cell Lymphoma patients treatment response from Whole Slide Images. Our method exploits several foundation models as feature extractors to obtain a local representation of the image corresponding to a small region of the tissue, then, a global representation of the image is obtained by aggregating these local representations using attention-based Multiple Instance Learning. Our experimental study conducted on a dataset of 152 patients, shows the promising results of our methodology, notably by highlighting the advantage of using foundation models compared to conventional ImageNet pre-training. Moreover, the obtained results clearly demonstrates the potential of foundation models for characterizing histopathology images and generating more suited semantic representation for this task.
