Table of Contents
Fetching ...

H-SGANet: Hybrid Sparse Graph Attention Network for Deformable Medical Image Registration

Yufeng Zhou, Wenming Cao

TL;DR

H-SGANet tackles deformable medical image registration by uniting a lightweight ConvNet–ViG–Transformer backbone with two novel modules: Sparse Graph Attention (SGA) and SSAFormer. SGA leverages fixed anatomical connectivity to expand receptive fields without KNN reshaping, while SSAFormer replaces traditional self-attention with a Separable Self-Attention mechanism that achieves linear complexity, enabling efficient long-range dependency modeling in 3D volumes. The approach yields state-of-the-art or competitive Dice scores with far fewer parameters (≈0.382M) and lower computational overhead, outperforming baselines like VoxelMorph on OASIS and LPBA40 while maintaining smooth deformation fields (low NJD). This work demonstrates the practicality of embedding brain-informed graph connectivity into a hybrid ConvNet–ViG–Transformer for DMIR, with potential extensions to other medical imaging tasks and broader applications requiring efficient long-range spatial reasoning.

Abstract

The integration of Convolutional Neural Network (ConvNet) and Transformer has emerged as a strong candidate for image registration, leveraging the strengths of both models and a large parameter space. However, this hybrid model, treating brain MRI volumes as grid or sequence structures, faces challenges in accurately representing anatomical connectivity, diverse brain regions, and vital connections contributing to the brain's internal architecture. Concerns also arise regarding the computational expense and GPU memory usage associated with this model. To tackle these issues, a lightweight hybrid sparse graph attention network (H-SGANet) has been developed. This network incorporates a central mechanism, Sparse Graph Attention (SGA), based on a Vision Graph Neural Network (ViG) with predetermined anatomical connections. The SGA module expands the model's receptive field and seamlessly integrates into the network. To further amplify the advantages of the hybrid network, the Separable Self-Attention (SSA) is employed as an enhanced token mixer, integrated with depth-wise convolution to constitute SSAFormer. This strategic integration is designed to more effectively extract long-range dependencies. As a hybrid ConvNet-ViG-Transformer model, H-SGANet offers threefold benefits for volumetric medical image registration. It optimizes fixed and moving images concurrently through a hybrid feature fusion layer and an end-to-end learning framework. Compared to VoxelMorph, a model with a similar parameter count, H-SGANet demonstrates significant performance enhancements of 3.5% and 1.5% in Dice score on the OASIS dataset and LPBA40 dataset, respectively.

H-SGANet: Hybrid Sparse Graph Attention Network for Deformable Medical Image Registration

TL;DR

H-SGANet tackles deformable medical image registration by uniting a lightweight ConvNet–ViG–Transformer backbone with two novel modules: Sparse Graph Attention (SGA) and SSAFormer. SGA leverages fixed anatomical connectivity to expand receptive fields without KNN reshaping, while SSAFormer replaces traditional self-attention with a Separable Self-Attention mechanism that achieves linear complexity, enabling efficient long-range dependency modeling in 3D volumes. The approach yields state-of-the-art or competitive Dice scores with far fewer parameters (≈0.382M) and lower computational overhead, outperforming baselines like VoxelMorph on OASIS and LPBA40 while maintaining smooth deformation fields (low NJD). This work demonstrates the practicality of embedding brain-informed graph connectivity into a hybrid ConvNet–ViG–Transformer for DMIR, with potential extensions to other medical imaging tasks and broader applications requiring efficient long-range spatial reasoning.

Abstract

The integration of Convolutional Neural Network (ConvNet) and Transformer has emerged as a strong candidate for image registration, leveraging the strengths of both models and a large parameter space. However, this hybrid model, treating brain MRI volumes as grid or sequence structures, faces challenges in accurately representing anatomical connectivity, diverse brain regions, and vital connections contributing to the brain's internal architecture. Concerns also arise regarding the computational expense and GPU memory usage associated with this model. To tackle these issues, a lightweight hybrid sparse graph attention network (H-SGANet) has been developed. This network incorporates a central mechanism, Sparse Graph Attention (SGA), based on a Vision Graph Neural Network (ViG) with predetermined anatomical connections. The SGA module expands the model's receptive field and seamlessly integrates into the network. To further amplify the advantages of the hybrid network, the Separable Self-Attention (SSA) is employed as an enhanced token mixer, integrated with depth-wise convolution to constitute SSAFormer. This strategic integration is designed to more effectively extract long-range dependencies. As a hybrid ConvNet-ViG-Transformer model, H-SGANet offers threefold benefits for volumetric medical image registration. It optimizes fixed and moving images concurrently through a hybrid feature fusion layer and an end-to-end learning framework. Compared to VoxelMorph, a model with a similar parameter count, H-SGANet demonstrates significant performance enhancements of 3.5% and 1.5% in Dice score on the OASIS dataset and LPBA40 dataset, respectively.
Paper Structure (22 sections, 16 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 22 sections, 16 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overall block diagram of H-SGANet. It primarily features two core modules: (a) The SGA block leverages the graph representation depicted in block (c) to learn both global and graph-level representations. (b) The SSAFormer, contrasting with MHA's attention matrix, generates context scores and vectors using efficient element-wise operations, optimized for hybrid models processing volumetric data.
  • Figure 2: (a) In ViG, KNN graph attention allows the corner voxel of a 9×9×9 patch to form relationships with other voxels via random sampling. (b) SGA at the corresponding voxel position. In contrast to (a), SGA utilizes a structured graph, thereby eliminating the computationally intensive requirements of KNN and the need for data reshaping.
  • Figure 3: Overall block diagram of SSAFormer. Depthwise convolution and pointwise scaling are used as channel MLP to capture texture information. Token Mixer calculates context scores for each token with respect to a latent token $L$, leading to the generation of a context vector $c_v$ that captures global context with linear complexity ${O(k)}$
  • Figure 4: Qualitative comparison of various registration methods on the OASIS dataset. Lateral ventricles, putamen, and brain stem are color-coded yellow, blue, and red, respectively. Top-left, top-right and bottom panels display results on sagittal, coronal, and axial slices. In each panel, the first column exhibits the fixed image, moving image, and their overlap (a larger purple area indicates more overlap). The second and fourth rows represent the deformation field and its meshing. The Bottom panel, shown exclusively with ITK-SNAP software, shows deformation fields and grids for clearer method differentiation. Deformed grids in the top-left panel (4th row) were generated using the Matplotlib library in Python. Top-right panel (2nd row) deformation fields map spatial dimensions $x,y,\mathrm{~and~}z$ to RGB color channels, respectively.
  • Figure 5: Quantitative analysis of Jacobian overlap. Color maps depict the Jacobian determinant of registration fields obtained through diverse methods on the OASIS dataset. Regions highlighted in bright red signify occurrences of voxel folding, indicating instances where the value is less than or equal to 0.
  • ...and 3 more figures