Table of Contents
Fetching ...

Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation

Yixing Huang, Zahra Khodabakhshi, Ahmed Gomaa, Manuel Schmidt, Rainer Fietkau, Matthias Guckenberger, Nicolaus Andratschke, Christoph Bert, Stephanie Tanadini-Lang, Florian Putz

TL;DR

This paper tackles the challenge of data heterogeneity in multicenter BM autosegmentation under privacy constraints. It evaluates learning without forgetting (LWF) as a privacy-preserving continual-learning approach, comparing it to naive transfer learning and single-center baselines using six BM MRI datasets and the DeepMedic architecture. Results show that data heterogeneity affects generalizability, and LWF often provides a better balance of sensitivity and precision across centers, outperforming TL in cross-center tasks and enabling effective peer-to-peer collaboration without sharing raw data. The study highlights LWF’s potential for practical multicenter BM autosegmentation deployment, while acknowledging computational demands and the need for broader bilateral collaboration and QA integration.

Abstract

Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data. Materials and methods: A total of six BM datasets from University Hospital Erlangen (UKER), University Hospital Zurich (USZ), Stanford, UCSF, NYU and BraTS Challenge 2023 on BM segmentation were used for this evaluation. First, the multicenter performance of a convolutional neural network (DeepMedic) for BM autosegmentation was established for exclusive single-center training and for training on pooled data, respectively. Subsequently bilateral collaboration was evaluated, where a UKER pretrained model is shared to another center for further training using transfer learning (TL) either with or without LWF. Results: For single-center training, average F1 scores of BM detection range from 0.625 (NYU) to 0.876 (UKER) on respective single-center test data. Mixed multicenter training notably improves F1 scores at Stanford and NYU, with negligible improvement at other centers. When the UKER pretrained model is applied to USZ, LWF achieves a higher average F1 score (0.839) than naive TL (0.570) and single-center training (0.688) on combined UKER and USZ test data. Naive TL improves sensitivity and contouring accuracy, but compromises precision. Conversely, LWF demonstrates commendable sensitivity, precision and contouring accuracy. When applied to Stanford, similar performance was observed. Conclusion: Data heterogeneity results in varying performance in BM autosegmentation, posing challenges to model generalizability. LWF is a promising approach to peer-to-peer privacy-preserving model training.

Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation

TL;DR

This paper tackles the challenge of data heterogeneity in multicenter BM autosegmentation under privacy constraints. It evaluates learning without forgetting (LWF) as a privacy-preserving continual-learning approach, comparing it to naive transfer learning and single-center baselines using six BM MRI datasets and the DeepMedic architecture. Results show that data heterogeneity affects generalizability, and LWF often provides a better balance of sensitivity and precision across centers, outperforming TL in cross-center tasks and enabling effective peer-to-peer collaboration without sharing raw data. The study highlights LWF’s potential for practical multicenter BM autosegmentation deployment, while acknowledging computational demands and the need for broader bilateral collaboration and QA integration.

Abstract

Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data. Materials and methods: A total of six BM datasets from University Hospital Erlangen (UKER), University Hospital Zurich (USZ), Stanford, UCSF, NYU and BraTS Challenge 2023 on BM segmentation were used for this evaluation. First, the multicenter performance of a convolutional neural network (DeepMedic) for BM autosegmentation was established for exclusive single-center training and for training on pooled data, respectively. Subsequently bilateral collaboration was evaluated, where a UKER pretrained model is shared to another center for further training using transfer learning (TL) either with or without LWF. Results: For single-center training, average F1 scores of BM detection range from 0.625 (NYU) to 0.876 (UKER) on respective single-center test data. Mixed multicenter training notably improves F1 scores at Stanford and NYU, with negligible improvement at other centers. When the UKER pretrained model is applied to USZ, LWF achieves a higher average F1 score (0.839) than naive TL (0.570) and single-center training (0.688) on combined UKER and USZ test data. Naive TL improves sensitivity and contouring accuracy, but compromises precision. Conversely, LWF demonstrates commendable sensitivity, precision and contouring accuracy. When applied to Stanford, similar performance was observed. Conclusion: Data heterogeneity results in varying performance in BM autosegmentation, posing challenges to model generalizability. LWF is a promising approach to peer-to-peer privacy-preserving model training.
Paper Structure (8 sections, 2 equations, 3 figures, 3 tables)

This paper contains 8 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Exemplary images from different datasets (top: axial slices; bottom: coronal slices). The axial and coronal slices of the same center are not from the same patients. The green arrows indicate true positive brain metastases, and the red arrows indicate suspicious spots which are true negative.
  • Figure 2: The BM detection and segmentation performances of different models. (a) and (b) are the performances of UKER and UCSF single-center-training models with respect to training data amount (number of volumes/patients), respectively. (c) and (d) display the BM detection performance and segmentation performance respectively with single-center training and mixed training. (e) and (f) display BM detection and segmentation performances of different methods. The error bars in (c)-(d) indicate the standard deviations.
  • Figure 3: The BM autosegmentation examples of different models. The top rows displays the results of different models on one UKER exemplary image when the UKER model was shared to USZ, while the second rows displays other representative false positive examples of the TLUKER$\Rightarrow$USZ model. The bottom two rows display the results of different models when the UKER model was shared to Stanford. Red areas are false positive segmentations, green areas are true positive segmentations, and the yellow arrows indicate false negative metastases. Subfigure (j) is an example of human annotation errors in the Stanford dataset, where the tiny metastasis indicated by the yellow arrow was not labeled, and the annotation mask of the top metastasis is not accurate.