A Communication-Efficient and Privacy-Aware Distributed Algorithm for Sparse PCA
Lei Wang, Xin Liu, Yin Zhang
TL;DR
The paper tackles sparse PCA in distributed settings by introducing DSSAL1, a communication-efficient, privacy-aware algorithm built on an extended subspace-splitting framework for non-smooth objectives on the Stiefel manifold. It blends an ADMM-like global-local update scheme with a masking mechanism that preserves privacy while accelerating convergence. The authors prove global convergence to stationary points and establish a sublinear rate, and demonstrate through extensive experiments that DSSAL1 requires far fewer communication rounds than competing methods. This work enables scalable sparse PCA on large, distributed datasets with strong privacy guarantees and reduced inter-node communication, advancing practical deployments in data-rich, privacy-sensitive domains.
Abstract
Sparse principal component analysis (PCA) improves interpretability of the classic PCA by introducing sparsity into the dimension-reduction process. Optimization models for sparse PCA, however, are generally non-convex, non-smooth and more difficult to solve, especially on large-scale datasets requiring distributed computation over a wide network. In this paper, we develop a distributed and centralized algorithm called DSSAL1 for sparse PCA that aims to achieve low communication overheads by adapting a newly proposed subspace-splitting strategy to accelerate convergence. Theoretically, convergence to stationary points is established for DSSAL1. Extensive numerical results show that DSSAL1 requires far fewer rounds of communication than state-of-the-art peer methods. In addition, we make the case that since messages exchanged in DSSAL1 are well-masked, the possibility of private-data leakage in DSSAL1 is much lower than in some other distributed algorithms.
