Table of Contents
Fetching ...

Multi-View Deformable Convolution Meets Visual Mamba for Coronary Artery Segmentation

Xiaochan Yuan, Pai Zeng

Abstract

Accurate segmentation of coronary arteries from computed tomography angiography (CTA) images is of paramount clinical importance for the diagnosis and treatment planning of cardiovascular diseases. However, coronary artery segmentation remains challenging due to the inherent multi-branching and slender tubular morphology of the vasculature, compounded by severe class imbalance between foreground vessels and background tissue. Conventional convolutional neural network (CNN)-based approaches struggle to capture long-range dependencies among spatially distant vascular structures, while Vision Transformer (ViT)-based methods incur prohibitive computational overhead that hinders deployment in resource-constrained clinical settings. Motivated by the recent success of state space models (SSMs) in efficiently modeling long-range sequential dependencies with linear complexity, we propose MDSVM-UNet, a novel two-stage coronary artery segmentation framework that synergistically integrates multidirectional snake convolution (MDSConv) with residual visual Mamba (RVM). In the encoding stage, we introduce MDSConv, a deformable convolution module that learns adaptive offsets along three orthogonal anatomical planes -- sagittal, coronal, and axial -- thereby enabling comprehensive multi-view feature fusion that faithfully captures the elongated and tortuous geometry of coronary vessels. In the decoding stage, we design an RVM-based upsampling decoder block that leverages selective state space mechanisms to model inter-slice long-range dependencies while preserving linear computational complexity. Furthermore, we propose a progressive two-stage segmentation strategy: the first stage performs coarse whole-image segmentation to guide intelligent block extraction, while the second stage conducts fine-grained block-level segmentation to recover vascular details and suppress false positives..

Multi-View Deformable Convolution Meets Visual Mamba for Coronary Artery Segmentation

Abstract

Accurate segmentation of coronary arteries from computed tomography angiography (CTA) images is of paramount clinical importance for the diagnosis and treatment planning of cardiovascular diseases. However, coronary artery segmentation remains challenging due to the inherent multi-branching and slender tubular morphology of the vasculature, compounded by severe class imbalance between foreground vessels and background tissue. Conventional convolutional neural network (CNN)-based approaches struggle to capture long-range dependencies among spatially distant vascular structures, while Vision Transformer (ViT)-based methods incur prohibitive computational overhead that hinders deployment in resource-constrained clinical settings. Motivated by the recent success of state space models (SSMs) in efficiently modeling long-range sequential dependencies with linear complexity, we propose MDSVM-UNet, a novel two-stage coronary artery segmentation framework that synergistically integrates multidirectional snake convolution (MDSConv) with residual visual Mamba (RVM). In the encoding stage, we introduce MDSConv, a deformable convolution module that learns adaptive offsets along three orthogonal anatomical planes -- sagittal, coronal, and axial -- thereby enabling comprehensive multi-view feature fusion that faithfully captures the elongated and tortuous geometry of coronary vessels. In the decoding stage, we design an RVM-based upsampling decoder block that leverages selective state space mechanisms to model inter-slice long-range dependencies while preserving linear computational complexity. Furthermore, we propose a progressive two-stage segmentation strategy: the first stage performs coarse whole-image segmentation to guide intelligent block extraction, while the second stage conducts fine-grained block-level segmentation to recover vascular details and suppress false positives..
Paper Structure (28 sections, 14 equations, 3 figures, 3 tables)

This paper contains 28 sections, 14 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the proposed two-stage coronary artery segmentation framework. In Stage 1, the input CTA volume is downsampled and segmented using MDSVM-UNet to produce a coarse segmentation map. The coarse results guide the extraction of $64 \times 64 \times 64$ voxel blocks from the original resolution. In Stage 2, MDSVM-UNet performs fine-grained block-level segmentation, and the results from all blocks are merged to produce the final output.
  • Figure 2: Detailed architecture of MDSVM-UNet. (a) The overall encoder-decoder network with UNet++-style dense skip connections. The encoder comprises four MDSConv blocks and one RVM block, while the decoder consists of three RVM blocks and a convolutional output layer. (b) The Residual Visual Mamba (RVM) layer with residual connection and scaling factor. (c) The Vision State Space Module (VSSM) with dual-branch parallel processing.
  • Figure 3: Architecture of the Multidirectional Snake Convolution (MDSConv) layer. Input features are processed by standard convolution and three axis-specific snake convolutions (along $x$, $y$, $z$ axes), followed by concatenation and fusion.