iPac: Incorporating Intra-image Patch Context into Graph Neural Networks for Medical Image Classification
Usama Zidan, Mohamed Gaber, Mohammed M. Abdelsamea
TL;DR
iPac addresses the limited capture of intra-image structure in GNN-based medical image classification by converting images into graphs. It partitions images into $P$ patches, encodes them with a Swin Transformer autoencoder, clusters into $C$ centroids, and builds an adjacency-based graph whose edges carry meaningful relationships, followed by edge-aware GNNs and message passing for classification. The approach yields up to a $5\\%$ average ACC improvement over baselines on MedMNIST datasets and is supported by extensive ablations and hyperparameter studies, validating the design choices. This graph-based, patch-centric framework offers a versatile and interpretable solution for medical image analysis and can be extended to other domains requiring structure-aware representations.
Abstract
Graph neural networks have emerged as a promising paradigm for image processing, yet their performance in image classification tasks is hindered by a limited consideration of the underlying structure and relationships among visual entities. This work presents iPac, a novel approach to introduce a new graph representation of images to enhance graph neural network image classification by recognizing the importance of underlying structure and relationships in medical image classification. iPac integrates various stages, including patch partitioning, feature extraction, clustering, graph construction, and graph-based learning, into a unified network to advance graph neural network image classification. By capturing relevant features and organising them into clusters, we construct a meaningful graph representation that effectively encapsulates the semantics of the image. Experimental evaluation on diverse medical image datasets demonstrates the efficacy of iPac, exhibiting an average accuracy improvement of up to 5% over baseline methods. Our approach offers a versatile and generic solution for image classification, particularly in the realm of medical images, by leveraging the graph representation and accounting for the inherent structure and relationships among visual entities.
