Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data

Santiago Silva; Boris Gutman; Eduardo Romero; Paul M Thompson; Andre Altmann; Marco Lorenzi

Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data

Santiago Silva, Boris Gutman, Eduardo Romero, Paul M Thompson, Andre Altmann, Marco Lorenzi

TL;DR

The paper addresses privacy constraints in multi-center brain-imaging research by proposing a privacy-preserving federated framework that performs end-to-end analysis without sharing raw data. It combines distributed data standardization, ADMM-based confounder correction, and federated PCA to estimate a global representation from local data, with the global covariance expressed as $S=\sum_c \mathbf{E}_c \mathbf{E}_c^T$. Validated on synthetic data and applied to subcortical shape/thickness features across ADNI, PPMI, MIRIAD, and UK Biobank, the approach demonstrates consistent recovery of multivariate variability and cross-disease patterns while preserving data privacy. The work advances distributed imaging genetics and enables ENIGMA-like multi-center meta-analyses, with potential extensions to larger-scale multimodal analyses and genetics-informed studies. Overall, this framework enables scalable, privacy-conscious extraction of interpretable multivariate brain-structure signals from distributed cohorts.

Abstract

At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting the full exploitation of big data in the study of brain disorders. Here we propose a federated learning framework for securely accessing and meta-analyzing any biomedical data without sharing individual information. We illustrate our framework by investigating brain structural relationships across diseases and clinical cohorts. The framework is first tested on synthetic data and then applied to multi-centric, multi-database studies including ADNI, PPMI, MIRIAD and UK Biobank, showing the potential of the approach for further applications in distributed analysis of multi-centric cohorts

Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data

TL;DR

Abstract

Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)