A Hybrid Federated Kernel Regularized Least Squares Algorithm
Celeste Damiani, Yulia Rodina, Sergio Decherchi
TL;DR
The paper tackles privacy-preserving learning in a hybrid horizontal-vertical federated setting where clinical and omics data are distributed across hospitals and omics centers. It develops two kernel-based RRLS procedures under a hybrid federated Conjugate Gradient framework: a fast naive method and a secure iterative variant that preserves data privacy through aggregated updates and synchronized noise removal. The authors prove that the federated methods converge to the centralized RRLS solution and demonstrate competitive performance on several datasets, while also analyzing security via Nyström-like landmarks and EDM reconstruction risk. They also explore defense strategies, including randomized kernel widths, to mitigate potential leakage, and discuss integrating RRLS into deeper multi-omics pipelines for practical impact. Overall, the work advances privacy-aware, kernel-based learning in complex data-partition scenarios with practical implications for clinical-omics research.
Abstract
Federated learning is becoming an increasingly viable and accepted strategy for building machine learning models in critical privacy-preserving scenarios such as clinical settings. Often, the data involved is not limited to clinical data but also includes additional omics features (e.g. proteomics). Consequently, data is distributed not only across hospitals but also across omics centers, which are labs capable of generating such additional features from biosamples. This scenario leads to a hybrid setting where data is scattered both in terms of samples and features. In this hybrid setting, we present an efficient reformulation of the Kernel Regularized Least Squares algorithm, introduce two variants and validate them using well-established datasets. Lastly, we discuss security measures to defend against possible attacks.
