Table of Contents
Fetching ...

Multi-Scale Feature Fusion with Image-Driven Spatial Integration for Left Atrium Segmentation from Cardiac MRI Images

Bipasha Kundu, Zixin Yang, Richard Simon, Cristian Linte

TL;DR

The paper tackles automated left atrium segmentation from LGE cardiac MRI by coupling vision foundation model encoders (DINOv2) with a UNet-style decoder. It introduces a learnable, multi-scale feature fusion mechanism that dynamically prioritizes encoder blocks and reintroduces the original input image during decoding to preserve high-resolution spatial details. Empirical results on LAScarQS 2022 show that DINOv2-giant with the proposed fusion and input-image integration outperforms the nnUNet baseline in Dice and IoU, with statistical significance, and ablations confirm the benefits of selective block weighting and spatial integration. The approach demonstrates improved generalization and accuracy, supporting its potential for clinical workflows in AF management and beyond.

Abstract

Accurate segmentation of the left atrium (LA) from late gadolinium-enhanced magnetic resonance imaging plays a vital role in visualizing diseased atrial structures, enabling the diagnosis and management of cardiovascular diseases. It is particularly essential for planning treatment with ablation therapy, a key intervention for atrial fibrillation (AF). However, manual segmentation is time-intensive and prone to inter-observer variability, underscoring the need for automated solutions. Class-agnostic foundation models like DINOv2 have demonstrated remarkable feature extraction capabilities in vision tasks. However, their lack of domain specificity and task-specific adaptation can reduce spatial resolution during feature extraction, impacting the capture of fine anatomical detail in medical imaging. To address this limitation, we propose a segmentation framework that integrates DINOv2 as an encoder with a UNet-style decoder, incorporating multi-scale feature fusion and input image integration to enhance segmentation accuracy. The learnable weighting mechanism dynamically prioritizes hierarchical features from different encoder blocks of the foundation model, optimizing feature selection for task relevance. Additionally, the input image is reintroduced during the decoding stage to preserve high-resolution spatial details, addressing limitations of downsampling in the encoder. We validate our approach on the LAScarQS 2022 dataset and demonstrate improved performance with a 92.3% Dice and 84.1% IoU score for giant architecture compared to the nnUNet baseline model. These findings emphasize the efficacy of our approach in advancing the field of automated left atrium segmentation from cardiac MRI.

Multi-Scale Feature Fusion with Image-Driven Spatial Integration for Left Atrium Segmentation from Cardiac MRI Images

TL;DR

The paper tackles automated left atrium segmentation from LGE cardiac MRI by coupling vision foundation model encoders (DINOv2) with a UNet-style decoder. It introduces a learnable, multi-scale feature fusion mechanism that dynamically prioritizes encoder blocks and reintroduces the original input image during decoding to preserve high-resolution spatial details. Empirical results on LAScarQS 2022 show that DINOv2-giant with the proposed fusion and input-image integration outperforms the nnUNet baseline in Dice and IoU, with statistical significance, and ablations confirm the benefits of selective block weighting and spatial integration. The approach demonstrates improved generalization and accuracy, supporting its potential for clinical workflows in AF management and beyond.

Abstract

Accurate segmentation of the left atrium (LA) from late gadolinium-enhanced magnetic resonance imaging plays a vital role in visualizing diseased atrial structures, enabling the diagnosis and management of cardiovascular diseases. It is particularly essential for planning treatment with ablation therapy, a key intervention for atrial fibrillation (AF). However, manual segmentation is time-intensive and prone to inter-observer variability, underscoring the need for automated solutions. Class-agnostic foundation models like DINOv2 have demonstrated remarkable feature extraction capabilities in vision tasks. However, their lack of domain specificity and task-specific adaptation can reduce spatial resolution during feature extraction, impacting the capture of fine anatomical detail in medical imaging. To address this limitation, we propose a segmentation framework that integrates DINOv2 as an encoder with a UNet-style decoder, incorporating multi-scale feature fusion and input image integration to enhance segmentation accuracy. The learnable weighting mechanism dynamically prioritizes hierarchical features from different encoder blocks of the foundation model, optimizing feature selection for task relevance. Additionally, the input image is reintroduced during the decoding stage to preserve high-resolution spatial details, addressing limitations of downsampling in the encoder. We validate our approach on the LAScarQS 2022 dataset and demonstrate improved performance with a 92.3% Dice and 84.1% IoU score for giant architecture compared to the nnUNet baseline model. These findings emphasize the efficacy of our approach in advancing the field of automated left atrium segmentation from cardiac MRI.

Paper Structure

This paper contains 11 sections, 3 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Framework of the proposed UNet decoder-based segmentation model with learnable weight blocks, skip connections, input image augmentation, and decoder blocks
  • Figure 2: Visualization of left atrium segmentation results comparing the baseline model and the proposed framework.