SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

De-Xing Huang; Xiao-Hu Zhou; Xiao-Liang Xie; Shi-Qi Liu; Shuang-Yi Wang; Zhen-Qiu Feng; Mei-Jiang Gui; Hao Li; Tian-Yu Xiang; Bo-Xian Yao; Zeng-Guang Hou

SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Bo-Xian Yao, Zeng-Guang Hou

TL;DR

The paper tackles real-time vessel segmentation in intraoperative imaging, where low SNR and complex vessel morphology hinder accuracy. It introduces SPIRONet, a dual-encoder network that separately learns local spatial features and global frequency features via a Fourier-based frequency encoder, integrates them with a cross-attention fusion module, and further refines multi-channel responses through a topological channel interaction (TCI) module based on graph neural networks. SPIRONet achieves state-of-the-art performance on four challenging benchmarks (CADSA, CAXF, DCA1, XCAD) with an inference speed around 21 FPS on 512×512 images, making it suitable for real-time vascular navigation systems. The work provides comprehensive ablation and visualization analyses that demonstrate the benefits of combining spatial-frequency representations and graph-based channel refinement for robust, accurate vessel segmentation in noisy intraoperative data.

Abstract

Automatic vessel segmentation is paramount for developing next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel interaction network (SPIRONet) is proposed to address the above issues. Specifically, dual encoders are utilized to comprehensively capture local spatial and global frequency vessel features. Then, a cross-attention fusion module is introduced to effectively fuse spatial and frequency features, thereby enhancing feature discriminability. Furthermore, a topological channel interaction module is designed to filter out task-irrelevant responses based on graph neural networks. Extensive experimental results on several challenging datasets (CADSA, CAXF, DCA1, and XCAD) demonstrate state-of-the-art performances of our method. Moreover, the inference speed of SPIRONet is 21 FPS with a 512x512 input size, surpassing clinical real-time requirements (6~12FPS). These promising outcomes indicate SPIRONet's potential for integration into vascular interventional navigation systems. Code is available at https://github.com/Dxhuang-CASIA/SPIRONet.

SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

TL;DR

Abstract

Paper Structure (26 sections, 13 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 26 sections, 13 equations, 7 figures, 6 tables, 1 algorithm.

Introduction
Related works
Traditional vessel segmentation approaches
Vessel segmentation based on deep learning
Learning form frequency domain
Channel refinement module
Methodology
Preliminaries: 2D Fourier transform
Overall architecture
Spatial-frequency representation learning
Cross-attention fusion
Topological channel interaction
Loss function
Experimental setup
Datasets
...and 11 more sections

Figures (7)

Figure 1: Illustration of challenges in vessel segmentation. i) Low signal-to-noise ratio (SNR). ii) Small or slender vessel branches. iii) Non-target and motion artifact interference. X-ray fluoroscopy images and their corresponding ground truths are from the XCAD dataset ma2021self.
Figure 2: The overview of our SPIRONet. It adopts a spatial encoder and a frequency encoder to capture complementary spatial and frequency vessel features. These two kinds of features are fused effectively by cross-attention fusion modules. The fused features are fed into CNN decoders to recover the original resolutions. After that, multi-channel features containing class-specific responses are refined by a topological channel interaction module based on GNNs. Finally, vessel predictions are obtained through a segmentation head.
Figure 3: The architecture of encoder blocks. (a) Spatial encoder block; (b) Frequency encoder block. $\oplus$ means the element-wise addition.
Figure 4: Cross-attention module. $\otimes$, $\oplus$, and $\textcircled{c}$ indicate the matrix multiplication, element-wise addition, and channel-dimension concatenation.
Figure 5: Topological channel interaction module. $\oplus$ represents element-wise addition.
...and 2 more figures

SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

TL;DR

Abstract

SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)