Table of Contents
Fetching ...

UnSegMedGAT: Unsupervised Medical Image Segmentation using Graph Attention Networks Clustering

A. Mudit Adityaja, Saurabh J. Shigwan, Nitin Kumar

TL;DR

The proposed unsupervised segmentation framework using a pretrained Dino-ViT achieves state-of-the-art performance, even significantly surpassing or matching that of existing (semi)supervised technique such as MedSAM which is a Segment Anything Model in medical images.

Abstract

The data-intensive nature of supervised classification drives the interest of the researchers towards unsupervised approaches, especially for problems such as medical image segmentation, where labeled data is scarce. Building on the recent advancements of Vision transformers (ViT) in computer vision, we propose an unsupervised segmentation framework using a pre-trained Dino-ViT. In the proposed method, we leverage the inherent graph structure within the image to realize a significant performance gain for segmentation in medical images. For this, we introduce a modularity-based loss function coupled with a Graph Attention Network (GAT) to effectively capture the inherent graph topology within the image. Our method achieves state-of-the-art performance, even significantly surpassing or matching that of existing (semi)supervised technique such as MedSAM which is a Segment Anything Model in medical images. We demonstrate this using two challenging medical image datasets ISIC-2018 and CVC-ColonDB. This work underscores the potential of unsupervised approaches in advancing medical image analysis in scenarios where labeled data is scarce. The github repository of the code is available on [https://github.com/mudit-adityaja/UnSegMedGAT].

UnSegMedGAT: Unsupervised Medical Image Segmentation using Graph Attention Networks Clustering

TL;DR

The proposed unsupervised segmentation framework using a pretrained Dino-ViT achieves state-of-the-art performance, even significantly surpassing or matching that of existing (semi)supervised technique such as MedSAM which is a Segment Anything Model in medical images.

Abstract

The data-intensive nature of supervised classification drives the interest of the researchers towards unsupervised approaches, especially for problems such as medical image segmentation, where labeled data is scarce. Building on the recent advancements of Vision transformers (ViT) in computer vision, we propose an unsupervised segmentation framework using a pre-trained Dino-ViT. In the proposed method, we leverage the inherent graph structure within the image to realize a significant performance gain for segmentation in medical images. For this, we introduce a modularity-based loss function coupled with a Graph Attention Network (GAT) to effectively capture the inherent graph topology within the image. Our method achieves state-of-the-art performance, even significantly surpassing or matching that of existing (semi)supervised technique such as MedSAM which is a Segment Anything Model in medical images. We demonstrate this using two challenging medical image datasets ISIC-2018 and CVC-ColonDB. This work underscores the potential of unsupervised approaches in advancing medical image analysis in scenarios where labeled data is scarce. The github repository of the code is available on [https://github.com/mudit-adityaja/UnSegMedGAT].

Paper Structure

This paper contains 7 sections, 8 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: UnSegMedGAT Pipeline: we i) extract features $f$ of all (overlapping) image patches using vision transformer (ViT) and formulate a (complete) Graph $\mathcal{G}$ (few nodes shown, for illustration, in the same color as image patch windows), ii) then apply similarity (normalized $ff^T$) threshold to select important edges in $\mathcal{G}$, iii) aggregate and normalize features in GAT, darker node colors represent aggregation per series of GATs , iv) apply a fully connected network (FCN) with softmax activation to finally obtain node level clusters. vi) The modularity and regularization-based loss is finally used to train the model. vii-viii) At inference, edge refinement barron2016fast is used over the predicted mask.
  • Figure 2: Segmentation results on (a)-(b) ISIC-2018, (c)-(d) CVC-ColonDB sample images