Table of Contents
Fetching ...

Improving Generalization of Deep Learning for Brain Metastases Segmentation Across Institutions

Yuchen Yang, Shuangyang Zhong, Haijun Yu, Langcuomu Suo, Hongbin Han, Florian Putz, Yixing Huang

Abstract

Background: Deep learning has demonstrated significant potential for automated brain metastases (BM) segmentation; however, models trained at a singular institution often exhibit suboptimal performance at various sites due to disparities in scanner hardware, imaging protocols, and patient demographics. The goal of this work is to create a domain adaptation framework that will allow for BM segmentation to be used across multiple institutions. Methods: We propose a VAE-MMD preprocessing pipeline that combines variational autoencoders (VAE) with maximum mean discrepancy (MMD) loss, incorporating skip connections and self-attention mechanisms alongside nnU-Net segmentation. The method was tested on 740 patients from four public databases: Stanford, UCSF, UCLM, and PKG, evaluated by domain classifier's accuracy, sensitivity, precision, F1/F2 scores, surface Dice (sDice), and 95th percentile Hausdorff distance (HD95). Results: VAE-MMD reduced domain classifier accuracy from 0.91 to 0.50, indicating successful feature alignment across institutions. Reconstructed volumes attained a PSNR greater than 36 dB, maintaining anatomical accuracy. The combined method raised the mean F1 by 11.1% (0.700 to 0.778), the mean sDice by 7.93% (0.7121 to 0.7686), and reduced the mean HD95 by 65.5% (11.33 to 3.91 mm) across all four centers compared to the baseline nnU-Net. Conclusions: VAE-MMD effectively diminishes cross-institutional data heterogeneity and enhances BM segmentation generalization across volumetric, detection, and boundary-level metrics without necessitating target-domain labels, thereby overcoming a significant obstacle to the clinical implementation of AI-assisted segmentation.

Improving Generalization of Deep Learning for Brain Metastases Segmentation Across Institutions

Abstract

Background: Deep learning has demonstrated significant potential for automated brain metastases (BM) segmentation; however, models trained at a singular institution often exhibit suboptimal performance at various sites due to disparities in scanner hardware, imaging protocols, and patient demographics. The goal of this work is to create a domain adaptation framework that will allow for BM segmentation to be used across multiple institutions. Methods: We propose a VAE-MMD preprocessing pipeline that combines variational autoencoders (VAE) with maximum mean discrepancy (MMD) loss, incorporating skip connections and self-attention mechanisms alongside nnU-Net segmentation. The method was tested on 740 patients from four public databases: Stanford, UCSF, UCLM, and PKG, evaluated by domain classifier's accuracy, sensitivity, precision, F1/F2 scores, surface Dice (sDice), and 95th percentile Hausdorff distance (HD95). Results: VAE-MMD reduced domain classifier accuracy from 0.91 to 0.50, indicating successful feature alignment across institutions. Reconstructed volumes attained a PSNR greater than 36 dB, maintaining anatomical accuracy. The combined method raised the mean F1 by 11.1% (0.700 to 0.778), the mean sDice by 7.93% (0.7121 to 0.7686), and reduced the mean HD95 by 65.5% (11.33 to 3.91 mm) across all four centers compared to the baseline nnU-Net. Conclusions: VAE-MMD effectively diminishes cross-institutional data heterogeneity and enhances BM segmentation generalization across volumetric, detection, and boundary-level metrics without necessitating target-domain labels, thereby overcoming a significant obstacle to the clinical implementation of AI-assisted segmentation.

Paper Structure

This paper contains 32 sections, 5 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Representative axial T1-weighted post-contrast MRI slices from four institutions demonstrating heterogeneity in lesion characteristics. Green contours indicate expert-delineated metastases. (a) Stanford: multiple small scattered metastases characteristic of miliary disease pattern. (b) UCSF: mixed lesion sizes with prominent vascular enhancement. (c) UCLM: solitary well-circumscribed metastasis. (d) PKG: large solitary tumor from a patient with primary lung cancer. These differences in lesion burden and tumor morphology contribute to the domain gap that challenges cross-institutional model generalization.
  • Figure 2: VAE-MMD architecture and preprocessing pipeline. (a) The encoder progressively downsamples input volumes through four convolutional blocks (1$\rightarrow$32$\rightarrow$64$\rightarrow$128$\rightarrow$256 channels) with residual connections and self-attention modules. The latent space (512 dimensions) is regularized by KL divergence and aligned across domains via MMD loss. The decoder reconstructs images using transposed convolutions with skip connections from corresponding encoder blocks. (b) VAE-MMD reconstruction for a Stanford case (stanford_Mets_010): original images (top row) are passed through the VAE encoder-decoder to produce reconstructed volumes (middle row) in which institution-specific intensity distributions and scanner-related stylistic variations are normalized while lesion contrast and anatomical structures are preserved (PSNR = 38.55 dB, MSE = 0.000140). The near-black difference maps (bottom row, scale 0--0.5) confirm minimal information loss. These domain-harmonized reconstructions serve directly as nnU-Net inputs, reducing cross-institutional domain shift without requiring any target-domain labels.
  • Figure 3: Domain adaptation evaluation. (a) t-SNE visualization: two-center (top) and four-center (bottom) configurations before and after VAE-MMD, showing progressive inter-institutional mixing. (b) Confusion matrices before and after VAE-MMD: accuracy declines from 91.0% to 50.0%, confirming elimination of institution-specific feature signatures.
  • Figure 4: Comparison of the performance of four-center segmentation prior to and following VAE-MMD preprocessing. The blue bars show performance after VAE-MMD, while the orange bars show baseline performance. VAE-MMD consistently raised sensitivity, F1, and surface Dice (sDice) scores at most institutions. Stanford, UCSF, and UCLM saw the biggest drops in HD95, which means that domain adaptation made it easier to draw boundaries.
  • Figure 5: Qualitative comparison of segmentation results before (left, Baseline) and after (right, VAE-MMD) domain adaptation across four datasets (Stanford, UCSF, UCLM, PKG from top to bottom). Green contours indicate ground truth annotations; red contours indicate model predictions. Please zoom in the digital manuscript for better visualization.