UnSegGNet: Unsupervised Image Segmentation using Graph Neural Networks

Kovvuri Sai Gopal Reddy; Bodduluri Saran; A. Mudit Adityaja; Saurabh J. Shigwan; Nitin Kumar

UnSegGNet: Unsupervised Image Segmentation using Graph Neural Networks

Kovvuri Sai Gopal Reddy, Bodduluri Saran, A. Mudit Adityaja, Saurabh J. Shigwan, Nitin Kumar

TL;DR

UnSegGNet tackles unsupervised image segmentation by combining a pretrained Vision Transformer feature extractor with a graph-based clustering approach that uses a modularity objective $Q$ (built from the modularity matrix $B$) to maximize intra-cluster edge density without labels. It builds a patch-level graph from ViT features, refines representations through a shallow Graph Convolutional Network, and optimizes a loss that avoids expensive spectral decomposition. Key contributions include introducing a modularity-based clustering loss, leveraging SiLU/SeLU activations, and achieving competitive results across six public datasets spanning medical and natural images. The approach enables cross-domain unsupervised segmentation with efficient computation, and code is publicly available at the linked GitHub repository.

Abstract

Image segmentation, the process of partitioning an image into meaningful regions, plays a pivotal role in computer vision and medical imaging applications. Unsupervised segmentation, particularly in the absence of labeled data, remains a challenging task due to the inter-class similarity and variations in intensity and resolution. In this study, we extract high-level features of the input image using pretrained vision transformer. Subsequently, the proposed method leverages the underlying graph structures of the images, seeking to discover and delineate meaningful boundaries using graph neural networks and modularity based optimization criteria without relying on pre-labeled training data. Experimental results on benchmark datasets demonstrate the effectiveness and versatility of the proposed approach, showcasing competitive performance compared to the state-of-the-art unsupervised segmentation methods. This research contributes to the broader field of unsupervised medical imaging and computer vision by presenting an innovative methodology for image segmentation that aligns with real-world challenges. The proposed method holds promise for diverse applications, including medical imaging, remote sensing, and object recognition, where labeled data may be scarce or unavailable. The github repository of the code is available on [https://github.com/ksgr5566/unseggnet]

UnSegGNet: Unsupervised Image Segmentation using Graph Neural Networks

TL;DR

UnSegGNet tackles unsupervised image segmentation by combining a pretrained Vision Transformer feature extractor with a graph-based clustering approach that uses a modularity objective

(built from the modularity matrix

) to maximize intra-cluster edge density without labels. It builds a patch-level graph from ViT features, refines representations through a shallow Graph Convolutional Network, and optimizes a loss that avoids expensive spectral decomposition. Key contributions include introducing a modularity-based clustering loss, leveraging SiLU/SeLU activations, and achieving competitive results across six public datasets spanning medical and natural images. The approach enables cross-domain unsupervised segmentation with efficient computation, and code is publicly available at the linked GitHub repository.

Abstract

Paper Structure (17 sections, 12 equations, 3 figures, 3 tables)

This paper contains 17 sections, 12 equations, 3 figures, 3 tables.

Introduction
Contribution of this work
Materials and Methods
Pretrained Network
Graph Neural Network
Loss Function
Experiments and Results
Experimental Details
Datasets
CVC-ClinicDB bernal2015wm
KVASIR jha2020kvasir
ISIC-2018 codella2019skin
ECSSD shi2015hierarchical
DUTS wang2017learning
CUB wah2011caltech
...and 2 more sections

Figures (3)

Figure 1: Segmentation results on (a)-(b) ISIC-2018, (c)-(d) KVASIR, (e)-(f) CVC-ClinicDB sample images
Figure 2: UnSegGNet Pipeline: we i) extract features $f$ of all (overlapping) image patches using vision transformer (ViT) and formulate a (complete) Graph $G$ (few nodes shown, for illustration, in the same color as image patch windows), ii) then apply similarity (normalized $ff^T$) threshold to select important edges in $G$, iii) aggregate and normalize features in graph convolutional network (GCN), darker node colors represent aggregation, iv) apply a fully connected network (FCN) to finally obtain node level clusters. vi) The modularity and regularization based loss is finally used to train the model. vii-viii) At inference, edge refinement is used over the predicted mask.
Figure 3: Segmentation results on (a)-(b) ECSSD, (c)-(d) DUTS, (e)-(f) CUB sample images

UnSegGNet: Unsupervised Image Segmentation using Graph Neural Networks

TL;DR

Abstract

UnSegGNet: Unsupervised Image Segmentation using Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (3)