Table of Contents
Fetching ...

Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data

Santiago Silva, Boris Gutman, Eduardo Romero, Paul M Thompson, Andre Altmann, Marco Lorenzi

TL;DR

The paper addresses privacy constraints in multi-center brain-imaging research by proposing a privacy-preserving federated framework that performs end-to-end analysis without sharing raw data. It combines distributed data standardization, ADMM-based confounder correction, and federated PCA to estimate a global representation from local data, with the global covariance expressed as $S=\sum_c \mathbf{E}_c \mathbf{E}_c^T$. Validated on synthetic data and applied to subcortical shape/thickness features across ADNI, PPMI, MIRIAD, and UK Biobank, the approach demonstrates consistent recovery of multivariate variability and cross-disease patterns while preserving data privacy. The work advances distributed imaging genetics and enables ENIGMA-like multi-center meta-analyses, with potential extensions to larger-scale multimodal analyses and genetics-informed studies. Overall, this framework enables scalable, privacy-conscious extraction of interpretable multivariate brain-structure signals from distributed cohorts.

Abstract

At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting the full exploitation of big data in the study of brain disorders. Here we propose a federated learning framework for securely accessing and meta-analyzing any biomedical data without sharing individual information. We illustrate our framework by investigating brain structural relationships across diseases and clinical cohorts. The framework is first tested on synthetic data and then applied to multi-centric, multi-database studies including ADNI, PPMI, MIRIAD and UK Biobank, showing the potential of the approach for further applications in distributed analysis of multi-centric cohorts

Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data

TL;DR

The paper addresses privacy constraints in multi-center brain-imaging research by proposing a privacy-preserving federated framework that performs end-to-end analysis without sharing raw data. It combines distributed data standardization, ADMM-based confounder correction, and federated PCA to estimate a global representation from local data, with the global covariance expressed as . Validated on synthetic data and applied to subcortical shape/thickness features across ADNI, PPMI, MIRIAD, and UK Biobank, the approach demonstrates consistent recovery of multivariate variability and cross-disease patterns while preserving data privacy. The work advances distributed imaging genetics and enables ENIGMA-like multi-center meta-analyses, with potential extensions to larger-scale multimodal analyses and genetics-informed studies. Overall, this framework enables scalable, privacy-conscious extraction of interpretable multivariate brain-structure signals from distributed cohorts.

Abstract

At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting the full exploitation of big data in the study of brain disorders. Here we propose a federated learning framework for securely accessing and meta-analyzing any biomedical data without sharing individual information. We illustrate our framework by investigating brain structural relationships across diseases and clinical cohorts. The framework is first tested on synthetic data and then applied to multi-centric, multi-database studies including ADNI, PPMI, MIRIAD and UK Biobank, showing the potential of the approach for further applications in distributed analysis of multi-centric cohorts

Paper Structure

This paper contains 15 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Data flow to obtain: (a) the global statistics $\Bar{\mathbf{x}}$ and $\mathbf{\sigma}$, (b) the shared parameter matrix $\widehat{\mathbf{W}}$ to correct from covariates and (c) the approximated global covariance matrix $\mathbf{S}$. Red node: master; blue nodes: local centers. Arrows denote the data flows from centers (blue) and from the master (red).
  • Figure 2: Top-left: Mean square error (MSE) between $\mathbf{W}$ and $\widetilde{\mathbf{W}}$ for different numbers of centers. $N=2400$, $N_{\textrm{features}} = 50,000$ and $\dim(\mathbf{y}) = 20$. Top-right: Single-column of $\mathbf{W}$ vs $\widetilde{\mathbf{W}}$ for $C=100$. Bottom: Principal components (PC) vs federated ones (PC*) for 100 centers.
  • Figure 3: Data projected on the first 4 components. AD vs controls from different centers (top). MCI progressive and stable from ADNI (bottom). Federated PCA was performed on the whole data obtained from the 4 centers (table \ref{['tab:dataset']}).
  • Figure 4: First principal component estimated with the proposed federated framework. The component maps prevalently hippocampi and amigdalae. Left: Thickness. Right: Log-Jacobians.