Distributed Generalized Linear Models: A Privacy-Preserving Approach
Daniel Tinoco, Raquel Menezes, Carlos Baquero
TL;DR
This work tackles privacy-aware model fitting in distributed and streaming environments by developing a QR-based, privacy-preserving framework for linear regression that supports incremental updates and distributed computation. It extends the approach to generalized linear models by casting IRLS as iterative weighted LS problems solved via transformed coordinates, enabling distributed GLM estimation. The authors demonstrate, through extensive simulated and real-data experiments, that the distributed methods achieve accuracy indistinguishable from centralized implementations while reducing data sharing and leveraging scalable updates. The approach offers a practical, computationally efficient alternative to cryptographic privacy techniques, suitable for federated and streaming data scenarios with semi-honest threat models. Key results show near-identical coefficients and negligible MAE differences across LM and GLM in both synthetic and real datasets (Diamonds, Credit Cards).
Abstract
This paper presents a novel approach to classical linear regression, enabling model computation from data streams or in a distributed setting while preserving data privacy in federated environments. We extend this framework to generalized linear models (GLMs), ensuring scalability and adaptability to diverse data distributions while maintaining privacy-preserving properties. To assess the effectiveness of our approach, we conduct numerical studies on both simulated and real datasets, comparing our method with conventional maximum likelihood estimation for GLMs using iteratively reweighted least squares. Our results demonstrate the advantages of the proposed method in distributed and federated settings.
