Table of Contents
Fetching ...

GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation

Vishnuvardhan Purma, Suhas Srinath, Seshan Srirangarajan, Aanchal Kakkar, Prathosh A. P

TL;DR

Generative diffusion is proposed as the pretext task for histopathological image segmentation via generative diffusion models based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task.

Abstract

Histopathological image segmentation is a laborious and time-intensive task, often requiring analysis from experienced pathologists for accurate examinations. To reduce this burden, supervised machine-learning approaches have been adopted using large-scale annotated datasets for histopathological image analysis. However, in several scenarios, the availability of large-scale annotated data is a bottleneck while training such models. Self-supervised learning (SSL) is an alternative paradigm that provides some respite by constructing models utilizing only the unannotated data which is often abundant. The basic idea of SSL is to train a network to perform one or many pseudo or pretext tasks on unannotated data and use it subsequently as the basis for a variety of downstream tasks. It is seen that the success of SSL depends critically on the considered pretext task. While there have been many efforts in designing pretext tasks for classification problems, there haven't been many attempts on SSL for histopathological segmentation. Motivated by this, we propose an SSL approach for segmenting histopathological images via generative diffusion models in this paper. Our method is based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task. Hence, we propose generative diffusion as the pretext task for histopathological image segmentation. We also propose a multi-loss function-based fine-tuning for the downstream task. We validate our method using several metrics on two publically available datasets along with a newly proposed head and neck (HN) cancer dataset containing hematoxylin and eosin (H\&E) stained images along with annotations. Codes will be made public at https://github.com/suhas-srinath/GenSelfDiff-HIS.

GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation

TL;DR

Generative diffusion is proposed as the pretext task for histopathological image segmentation via generative diffusion models based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task.

Abstract

Histopathological image segmentation is a laborious and time-intensive task, often requiring analysis from experienced pathologists for accurate examinations. To reduce this burden, supervised machine-learning approaches have been adopted using large-scale annotated datasets for histopathological image analysis. However, in several scenarios, the availability of large-scale annotated data is a bottleneck while training such models. Self-supervised learning (SSL) is an alternative paradigm that provides some respite by constructing models utilizing only the unannotated data which is often abundant. The basic idea of SSL is to train a network to perform one or many pseudo or pretext tasks on unannotated data and use it subsequently as the basis for a variety of downstream tasks. It is seen that the success of SSL depends critically on the considered pretext task. While there have been many efforts in designing pretext tasks for classification problems, there haven't been many attempts on SSL for histopathological segmentation. Motivated by this, we propose an SSL approach for segmenting histopathological images via generative diffusion models in this paper. Our method is based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task. Hence, we propose generative diffusion as the pretext task for histopathological image segmentation. We also propose a multi-loss function-based fine-tuning for the downstream task. We validate our method using several metrics on two publically available datasets along with a newly proposed head and neck (HN) cancer dataset containing hematoxylin and eosin (H\&E) stained images along with annotations. Codes will be made public at https://github.com/suhas-srinath/GenSelfDiff-HIS.
Paper Structure (28 sections, 12 equations, 5 figures, 15 tables)

This paper contains 28 sections, 12 equations, 5 figures, 15 tables.

Figures (5)

  • Figure 1: An overview of the proposed framework. (a) Self-supervised pre-training using diffusion: The U-Net model (encoder-decoder) takes the corrupted version $\mathbf{x}_t$ of the image $\mathbf{x}_0$ and the corresponding time embedding $t_e$ as the input to predict the noise that takes $\mathbf{x}_0$ to $\mathbf{x}_t$, using the P2 weighted b45 loss. $f(\cdot)$ denotes the function that recovers $\mathbf{x}_{t-1}$ from $\mathbf{x}_t$. (b) Downstream segmentation: The self-supervised pre-trained U-Net is fine-tuned end-to-end in a supervised manner to predict the segmentation masks.
  • Figure 2: Sample real and generated patches using diffusion on three datasets: GlaS b50, MoNuSeg b51, and HN cancer (ours). The first four images in each row represent real image patches, and the last four images represent the generated image patches.
  • Figure 3: Qualitative Results of the proposed method (diffusion) along with other pretext tasks (top row): Context Restoration b4, Contrastive learning b40, CS-CO b41, CycleGAN b69, DIM b52, Inpainting b8 and VAE b18. The bottom row contains fully supervised and DDPM-based methods: Attention UNet b66, CIMD b72, Wolleb et al. b73, FCT b78, Baranchuk et al. b74, MedSegDiff b21, UNet b2 and Diffusion (Ours).
  • Figure 4: Confusion matrices for MoNuSeg (a) with the boundary and nucleus merged (with class 1 being nucleus), and (b) with boundary separated (here, class 0, 1 and 2 correspond to background, boundary and nucleus respectively).
  • Figure 5: Qualitative Results of the proposed multi-loss function on all three datasets. The losses are abbreviated as CE - cross-entropy, FL - focal loss, and SS - structural similarity.