Table of Contents
Fetching ...

MSA-UNet3+: Multi-Scale Attention UNet3+ with New Supervised Prototypical Contrastive Loss for Coronary DSA Image Segmentation

Rayan Merghani Ahmed, Adnan Iltaf, Mohamed Elmanna, Gang Zhao, Hongliang Li, Yue Du, Bin Li, Shoujun Zhou

TL;DR

This work targets coronary DSA segmentation, a task hindered by high intra-class variance and severe background diversity. It introduces SPCL, a hybrid loss that blends supervised contrastive learning with prototypical contrastive learning to enforce semantic embeddings in the encoder and to emphasize hard-background samples, integrated into the MSA-UNet3+ architecture featuring a Multi-Scale Attention Encoder, MSD-Bottleneck, and CAFM. Empirical results on a private dataset show that SPCL improves across multiple baselines and that MSA-UNet3+ with SPCL achieves leading Dice ($87.73\%$) and F1 ($87.78\%$) with superior boundary accuracy (ASD $0.76$, ACD $0.74$), highlighting its clinical relevance for identifying coronary stenosis. The framework offers practical benefits in precise vessel delineation and could be extended with multimodal fusion and lightweight variants for real-time clinical deployment.

Abstract

Accurate segmentation of coronary Digital Subtraction Angiography images is essential to diagnose and treat coronary artery diseases. Despite advances in deep learning, challenges such as high intra-class variance and class imbalance limit precise vessel delineation. Most existing approaches for coronary DSA segmentation cannot address these issues. Also, existing segmentation network's encoders do not directly generate semantic embeddings, which could enable the decoder to reconstruct segmentation masks effectively from these well-defined features. We propose a Supervised Prototypical Contrastive Loss that fuses supervised and prototypical contrastive learning to enhance coronary DSA image segmentation. The supervised contrastive loss enforces semantic embeddings in the encoder, improving feature differentiation. The prototypical contrastive loss allows the model to focus on the foreground class while alleviating the high intra-class variance and class imbalance problems by concentrating only on the hard-to-classify background samples. We implement the proposed SPCL loss within an MSA-UNet3+: a Multi-Scale Attention-Enhanced UNet3+ architecture. The architecture integrates key components: a Multi-Scale Attention Encoder and a Multi-Scale Dilated Bottleneck designed to enhance multi-scale feature extraction and a Contextual Attention Fusion Module built to keep fine-grained details while improving contextual understanding. Experiments on a private coronary DSA dataset show that MSA-UNet3+ outperforms state-of-the-art methods, achieving the highest Dice coefficient and F1-score and significantly reducing ASD and ACD. The developed framework provides clinicians with precise vessel segmentation, enabling accurate identification of coronary stenosis and supporting informed diagnostic and therapeutic decisions. The code will be released at https://github.com/rayanmerghani/MSA-UNet3plus.

MSA-UNet3+: Multi-Scale Attention UNet3+ with New Supervised Prototypical Contrastive Loss for Coronary DSA Image Segmentation

TL;DR

This work targets coronary DSA segmentation, a task hindered by high intra-class variance and severe background diversity. It introduces SPCL, a hybrid loss that blends supervised contrastive learning with prototypical contrastive learning to enforce semantic embeddings in the encoder and to emphasize hard-background samples, integrated into the MSA-UNet3+ architecture featuring a Multi-Scale Attention Encoder, MSD-Bottleneck, and CAFM. Empirical results on a private dataset show that SPCL improves across multiple baselines and that MSA-UNet3+ with SPCL achieves leading Dice () and F1 () with superior boundary accuracy (ASD , ACD ), highlighting its clinical relevance for identifying coronary stenosis. The framework offers practical benefits in precise vessel delineation and could be extended with multimodal fusion and lightweight variants for real-time clinical deployment.

Abstract

Accurate segmentation of coronary Digital Subtraction Angiography images is essential to diagnose and treat coronary artery diseases. Despite advances in deep learning, challenges such as high intra-class variance and class imbalance limit precise vessel delineation. Most existing approaches for coronary DSA segmentation cannot address these issues. Also, existing segmentation network's encoders do not directly generate semantic embeddings, which could enable the decoder to reconstruct segmentation masks effectively from these well-defined features. We propose a Supervised Prototypical Contrastive Loss that fuses supervised and prototypical contrastive learning to enhance coronary DSA image segmentation. The supervised contrastive loss enforces semantic embeddings in the encoder, improving feature differentiation. The prototypical contrastive loss allows the model to focus on the foreground class while alleviating the high intra-class variance and class imbalance problems by concentrating only on the hard-to-classify background samples. We implement the proposed SPCL loss within an MSA-UNet3+: a Multi-Scale Attention-Enhanced UNet3+ architecture. The architecture integrates key components: a Multi-Scale Attention Encoder and a Multi-Scale Dilated Bottleneck designed to enhance multi-scale feature extraction and a Contextual Attention Fusion Module built to keep fine-grained details while improving contextual understanding. Experiments on a private coronary DSA dataset show that MSA-UNet3+ outperforms state-of-the-art methods, achieving the highest Dice coefficient and F1-score and significantly reducing ASD and ACD. The developed framework provides clinicians with precise vessel segmentation, enabling accurate identification of coronary stenosis and supporting informed diagnostic and therapeutic decisions. The code will be released at https://github.com/rayanmerghani/MSA-UNet3plus.

Paper Structure

This paper contains 26 sections, 3 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Illustration of the desired semantic embeddings characteristics of an encoder, which should place features from the same class close together while distancing features from different classes: SCE optimizes the embedding space by minimizing the distance between similar foreground samples (in blue) and maximizing the distance between dissimilar ones. PCL focuses on learning prototypes for foreground samples (in blue star), pulling them close to their respective prototypes while pushing hard negative instances (those close to the prototypes) further away.
  • Figure 2: The architecture of the proposed MSA-UNet3+ model. The model integrates a Multi-Scale Dilated Bottleneck (MSD-Bottleneck) for multi-scale feature extraction and a Contextual Attention Fusion Module (CAFM) for enhanced contextual understanding. The M-encoder employs convolutional and transposed convolutional layers, while the decoders reconstruct the segmentation mask. This architecture enables precise segmentation of coronary arteries in DSA images by capturing both fine-grained details and broader structural information.
  • Figure 3: Framework overview of MSA-UNet3+ and SPCL loss. The encoder extracts multi-scale features. Feature embeddings are jointly optimized via the Supervised Prototypical Contrastive Loss (SPCL) for discriminative representation learning and segmentation loss (Dice + BCE (Binary Cross Entropy)) for precise boundary delineation.
  • Figure 4: The architecture of the M-Encoder (Multi-Scale Attention module) highlights the Squeeze-and-Excitation (SE) block implementation.
  • Figure 5: Detailed architecture of the Multi-Scale Dilated Bottleneck (MSD-Bottleneck) module in the proposed MSA-UNet3+ model.
  • ...and 6 more figures