Table of Contents
Fetching ...

Histopathology image embedding based on foundation models features aggregation for patient treatment response prediction

Bilel Guetarni, Feryal Windal, Halim Benhabiles, Mahfoud Chaibi, Romain Dubois, Emmanuelle Leteurtre, Dominique Collard

TL;DR

This work addresses predicting Diffuse Large B-Cell Lymphoma treatment response from histopathology whole slide images. It proposes a patch-based representation built from multiple foundation models pre-trained on large histopathology data, with patches concatenated and aggregated via attention-based MIL to form a WSI embedding. In a dataset of 152 patients and 384 WSIs, foundation-model features outperform ImageNet baselines, with concatenation of CONCH and HIPT providing the strongest performance; ablations show concatenation outperforms patch-level attention and AB-MIL-style WSI aggregation beats alternatives. The results support the value of foundation-model representations for histopathology and suggest further exploration of model aggregation strategies to improve treatment-response prediction and clinical decision support.

Abstract

Predicting the response of a patient to a cancer treatment is of high interest. Nonetheless, this task is still challenging from a medical point of view due to the complexity of the interaction between the patient organism and the considered treatment. Recent works on foundation models pre-trained with self-supervised learning on large-scale unlabeled histopathology datasets have opened a new direction towards the development of new methods for cancer diagnosis related tasks. In this article, we propose a novel methodology for predicting Diffuse Large B-Cell Lymphoma patients treatment response from Whole Slide Images. Our method exploits several foundation models as feature extractors to obtain a local representation of the image corresponding to a small region of the tissue, then, a global representation of the image is obtained by aggregating these local representations using attention-based Multiple Instance Learning. Our experimental study conducted on a dataset of 152 patients, shows the promising results of our methodology, notably by highlighting the advantage of using foundation models compared to conventional ImageNet pre-training. Moreover, the obtained results clearly demonstrates the potential of foundation models for characterizing histopathology images and generating more suited semantic representation for this task.

Histopathology image embedding based on foundation models features aggregation for patient treatment response prediction

TL;DR

This work addresses predicting Diffuse Large B-Cell Lymphoma treatment response from histopathology whole slide images. It proposes a patch-based representation built from multiple foundation models pre-trained on large histopathology data, with patches concatenated and aggregated via attention-based MIL to form a WSI embedding. In a dataset of 152 patients and 384 WSIs, foundation-model features outperform ImageNet baselines, with concatenation of CONCH and HIPT providing the strongest performance; ablations show concatenation outperforms patch-level attention and AB-MIL-style WSI aggregation beats alternatives. The results support the value of foundation-model representations for histopathology and suggest further exploration of model aggregation strategies to improve treatment-response prediction and clinical decision support.

Abstract

Predicting the response of a patient to a cancer treatment is of high interest. Nonetheless, this task is still challenging from a medical point of view due to the complexity of the interaction between the patient organism and the considered treatment. Recent works on foundation models pre-trained with self-supervised learning on large-scale unlabeled histopathology datasets have opened a new direction towards the development of new methods for cancer diagnosis related tasks. In this article, we propose a novel methodology for predicting Diffuse Large B-Cell Lymphoma patients treatment response from Whole Slide Images. Our method exploits several foundation models as feature extractors to obtain a local representation of the image corresponding to a small region of the tissue, then, a global representation of the image is obtained by aggregating these local representations using attention-based Multiple Instance Learning. Our experimental study conducted on a dataset of 152 patients, shows the promising results of our methodology, notably by highlighting the advantage of using foundation models compared to conventional ImageNet pre-training. Moreover, the obtained results clearly demonstrates the potential of foundation models for characterizing histopathology images and generating more suited semantic representation for this task.
Paper Structure (14 sections, 4 equations, 3 figures, 1 table)

This paper contains 14 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: WSI-based treatment response prediction for DLBCL patients with foundation models. Patch-level embeddings, extracted by multiple foundation models, are concatenated to create diverse and rich patch embeddings. An attention-based aggregation function is then applied to these, in order to obtain a WSI-level embedding. The treatment response is then predicted from this embedding.
  • Figure 2: Comparison of the performance of our treatment response prediction method by exploiting several foundation models and ResNet-50.
  • Figure 3: Comparison of MIL methods with different feature extractors. Results are averaged from 4 runs and standard deviations are reported as error bars.