Federated Transfer Learning with Differential Privacy
Mengchu Li, Ye Tian, Yang Feng, Yi Yu
TL;DR
This work introduces Federated Transfer Learning with Differential Privacy (FDP), a framework that enables learning on a target dataset by leveraging multiple heterogeneous sources under site-specific privacy constraints without a trusted central server. It formalizes a minimax analysis under FDP, defines the informative-source set, and develops adaptive procedures to select informative sources, ensuring privacy while improving estimation when sources are similar. The authors provide rigorous upper and lower bounds for univariate mean estimation, low-dimensional linear regression, and high-dimensional regression, showing that FDP interpolates between central DP and Local DP and can yield gains through knowledge transfer when heterogeneity is small and informative sources are abundant. Numerical experiments confirm the theoretical predictions, illustrating how FDP balances privacy, heterogeneity, and transfer in both homogeneous and heterogeneous settings, and highlighting practical considerations such as source dropout and adaptive inference.
Abstract
Federated learning has emerged as a powerful framework for analysing distributed data, yet two challenges remain pivotal: heterogeneity across sites and privacy of local data. In this paper, we address both challenges within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of federated differential privacy, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy model, we study three classical statistical problems: univariate mean estimation, low-dimensional linear regression, and high-dimensional linear regression. By investigating the minimax rates and quantifying the cost of privacy in each problem, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses account for data heterogeneity and privacy, highlighting the fundamental costs associated with each factor and the benefits of knowledge transfer in federated learning.
