Table of Contents
Fetching ...

Harmonized Spatial and Spectral Learning for Robust and Generalized Medical Image Segmentation

Vandan Gorade, Sparsh Mittal, Debesh Jha, Rekha Singhal, Ulas Bagci

TL;DR

This work tackles domain generalization in medical image segmentation (MIS) by addressing intra-class variations and inter-class dependencies. It introduces a dual spatial–spectral learning framework with a Spectral Correlation Coefficient regularizer, computed in the frequency domain via FFT on masks, and integrates it with the conventional spatial loss to form $\mathcal{L}_{final} = \mathcal{L}_{spatial} + \lambda \mathcal{L}_{spectral}$; this approach is architecture-agnostic and avoids FFT on input images. The method, validated on eight MIS datasets and two architectures (UNet and TransUNet), yields improvements in DSC/IOU, calibration, robustness to noise, and interpretability, with notable gains in OOD and cross-domain settings. The results demonstrate strong generalization across modalities (CT, MRI, skin, histopathology, polyps) and provide a pathway toward more reliable, interpretable MIS in diverse clinical contexts. The work also outlines future directions, including reducing false negatives, integrating with semi-supervised or knowledge-distillation techniques, and extending the approach beyond medical imaging.

Abstract

Deep learning has demonstrated remarkable achievements in medical image segmentation. However, prevailing deep learning models struggle with poor generalization due to (i) intra-class variations, where the same class appears differently in different samples, and (ii) inter-class independence, resulting in difficulties capturing intricate relationships between distinct objects, leading to higher false negative cases. This paper presents a novel approach that synergies spatial and spectral representations to enhance domain-generalized medical image segmentation. We introduce the innovative Spectral Correlation Coefficient objective to improve the model's capacity to capture middle-order features and contextual long-range dependencies. This objective complements traditional spatial objectives by incorporating valuable spectral information. Extensive experiments reveal that optimizing this objective with existing architectures like UNet and TransUNet significantly enhances generalization, interpretability, and noise robustness, producing more confident predictions. For instance, in cardiac segmentation, we observe a 0.81 pp and 1.63 pp (pp = percentage point) improvement in DSC over UNet and TransUNet, respectively. Our interpretability study demonstrates that, in most tasks, objectives optimized with UNet outperform even TransUNet by introducing global contextual information alongside local details. These findings underscore the versatility and effectiveness of our proposed method across diverse imaging modalities and medical domains.

Harmonized Spatial and Spectral Learning for Robust and Generalized Medical Image Segmentation

TL;DR

This work tackles domain generalization in medical image segmentation (MIS) by addressing intra-class variations and inter-class dependencies. It introduces a dual spatial–spectral learning framework with a Spectral Correlation Coefficient regularizer, computed in the frequency domain via FFT on masks, and integrates it with the conventional spatial loss to form ; this approach is architecture-agnostic and avoids FFT on input images. The method, validated on eight MIS datasets and two architectures (UNet and TransUNet), yields improvements in DSC/IOU, calibration, robustness to noise, and interpretability, with notable gains in OOD and cross-domain settings. The results demonstrate strong generalization across modalities (CT, MRI, skin, histopathology, polyps) and provide a pathway toward more reliable, interpretable MIS in diverse clinical contexts. The work also outlines future directions, including reducing false negatives, integrating with semi-supervised or knowledge-distillation techniques, and extending the approach beyond medical imaging.

Abstract

Deep learning has demonstrated remarkable achievements in medical image segmentation. However, prevailing deep learning models struggle with poor generalization due to (i) intra-class variations, where the same class appears differently in different samples, and (ii) inter-class independence, resulting in difficulties capturing intricate relationships between distinct objects, leading to higher false negative cases. This paper presents a novel approach that synergies spatial and spectral representations to enhance domain-generalized medical image segmentation. We introduce the innovative Spectral Correlation Coefficient objective to improve the model's capacity to capture middle-order features and contextual long-range dependencies. This objective complements traditional spatial objectives by incorporating valuable spectral information. Extensive experiments reveal that optimizing this objective with existing architectures like UNet and TransUNet significantly enhances generalization, interpretability, and noise robustness, producing more confident predictions. For instance, in cardiac segmentation, we observe a 0.81 pp and 1.63 pp (pp = percentage point) improvement in DSC over UNet and TransUNet, respectively. Our interpretability study demonstrates that, in most tasks, objectives optimized with UNet outperform even TransUNet by introducing global contextual information alongside local details. These findings underscore the versatility and effectiveness of our proposed method across diverse imaging modalities and medical domains.
Paper Structure (16 sections, 2 equations, 9 figures, 6 tables)

This paper contains 16 sections, 2 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: (A-1) Appearance disparities within a single class of patient slices, highlighted by white bounding boxes indicating pancreas variation. (A-2) variation in ROI across data acquisition centers. (A-3) ROI variation between modalities. (B-1/2/3) Models face challenges in effectively capturing intricate inter-class relationships, as highlighted by the presence of white bounding boxes. These indicate instances of false negatives, a result of the model's struggle to learn relationships between classes effectively.
  • Figure 2: A dense low-frequency spectrum (in the middle) indicates that the mask spectrum retains more object information than the image spectrum.
  • Figure 3: Large variations in spatial space correspond to small variations in spectral space and vice versa.
  • Figure 4: Method Workflow: Starting with image $x$ and mask $y$, an encoder-decoder network generates $\hat{y}$. Transforming to spectral space yields $y_{freq}$ and $\hat{y}_{freq}$. Training involves spatial objective $\mathcal{L}_{spatial}$ between $y$ and $\hat{y}$, alongside spectral objective $\mathcal{L}_{spectral}$ between $y_{freq}$ and $\hat{y}_{freq}$.
  • Figure 5: Segmentation maps for polyp and skin lesion segmentation: Kvasir-SEG and ISIC-18 are trained under IID settings, while PolypGen and ISIC-17 are treated as OOD datasets. Actual and predicted pathological regions are shown in Red and Green, respectively.
  • ...and 4 more figures