Table of Contents
Fetching ...

SA-CycleGAN-2.5D: Self-Attention CycleGAN with Tri-Planar Context for Multi-Site MRI Harmonization

Ishrith Gowda, Chunwei Liu

Abstract

Multi-site neuroimaging analysis is fundamentally confounded by scanner-induced covariate shifts, where the marginal distribution of voxel intensities $P(\mathbf{x})$ varies non-linearly across acquisition protocols while the conditional anatomy $P(\mathbf{y}|\mathbf{x})$ remains constant. This is particularly detrimental to radiomic reproducibility, where acquisition variance often exceeds biological pathology variance. Existing statistical harmonization methods (e.g., ComBat) operate in feature space, precluding spatial downstream tasks, while standard deep learning approaches are theoretically bounded by local effective receptive fields (ERF), failing to model the global intensity correlations characteristic of field-strength bias. We propose SA-CycleGAN-2.5D, a domain adaptation framework motivated by the $HΔH$-divergence bound of Ben-David et al., integrating three architectural innovations: (1) A 2.5D tri-planar manifold injection preserving through-plane gradients $\nabla_z$ at $O(HW)$ complexity; (2) A U-ResNet generator with dense voxel-to-voxel self-attention, surpassing the $O(\sqrt{L})$ receptive field limit of CNNs to model global scanner field biases; and (3) A spectrally-normalized discriminator constraining the Lipschitz constant ($K_D \le 1$) for stable adversarial optimization. Evaluated on 654 glioma patients across two institutional domains (BraTS and UPenn-GBM), our method reduces Maximum Mean Discrepancy (MMD) by 99.1% ($1.729 \to 0.015$) and degrades domain classifier accuracy to near-chance (59.7%). Ablation confirms that global attention is statistically essential (Cohen's $d = 1.32$, $p < 0.001$) for the harder heterogeneous-to-homogeneous translation direction. By bridging 2D efficiency and 3D consistency, our framework yields voxel-level harmonized images that preserve tumor pathophysiology, enabling reproducible multi-center radiomic analysis.

SA-CycleGAN-2.5D: Self-Attention CycleGAN with Tri-Planar Context for Multi-Site MRI Harmonization

Abstract

Multi-site neuroimaging analysis is fundamentally confounded by scanner-induced covariate shifts, where the marginal distribution of voxel intensities varies non-linearly across acquisition protocols while the conditional anatomy remains constant. This is particularly detrimental to radiomic reproducibility, where acquisition variance often exceeds biological pathology variance. Existing statistical harmonization methods (e.g., ComBat) operate in feature space, precluding spatial downstream tasks, while standard deep learning approaches are theoretically bounded by local effective receptive fields (ERF), failing to model the global intensity correlations characteristic of field-strength bias. We propose SA-CycleGAN-2.5D, a domain adaptation framework motivated by the -divergence bound of Ben-David et al., integrating three architectural innovations: (1) A 2.5D tri-planar manifold injection preserving through-plane gradients at complexity; (2) A U-ResNet generator with dense voxel-to-voxel self-attention, surpassing the receptive field limit of CNNs to model global scanner field biases; and (3) A spectrally-normalized discriminator constraining the Lipschitz constant () for stable adversarial optimization. Evaluated on 654 glioma patients across two institutional domains (BraTS and UPenn-GBM), our method reduces Maximum Mean Discrepancy (MMD) by 99.1% () and degrades domain classifier accuracy to near-chance (59.7%). Ablation confirms that global attention is statistically essential (Cohen's , ) for the harder heterogeneous-to-homogeneous translation direction. By bridging 2D efficiency and 3D consistency, our framework yields voxel-level harmonized images that preserve tumor pathophysiology, enabling reproducible multi-center radiomic analysis.
Paper Structure (40 sections, 3 equations, 5 figures, 5 tables)

This paper contains 40 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: SA-CycleGAN-2.5D generator architecture. The 2.5D tri-planar input ($12$ channels) passes through a convolutional stem, three encoder stages with CBAM modules, nine residual bottleneck blocks (three groups: CBAM, self-attention, CBAM, plus a global self-attention module), and three decoder stages with skip connections. Orange blocks denote self-attention; green blocks denote CBAM. The discriminator (not shown) is a spectrally-normalized multi-scale PatchGAN.
  • Figure 2: Harmonization results ($A{\to}B{\to}A$) across T1, T1CE, T2, FLAIR (rows). Columns: input, baseline translation, +Attention translation, baseline reconstruction, +Attention reconstruction, attention difference map. Structural features are preserved; changes concentrate on global intensity (not anatomy).
  • Figure 3: t-SNE visualization of ResNet-18 features ($n{=}318$). (a) Raw: clear domain separation (98.4% classifier accuracy, MMD = 1.729). (b) Harmonized: domains thoroughly interleaved (59.7% accuracy, MMD = 0.015).
  • Figure 4: Ablation study: Cohen's $d$ effect size across modalities and translation directions. Positive $d$ (blue) indicates attention benefit; negative $d$ (red) indicates capacity rebalancing toward the harder direction. All $|d|{>}1.0$ indicates large effects.
  • Figure 5: Radiomics feature scatter: pre- vs. post-harmonization values across 512 IBSI features (first-order, GLCM, shape). Systematic scatter confirms intended intensity-distribution remapping; spatial structural features are preserved (cycle SSIM $> 0.92$).