Federated Binary Matrix Factorization using Proximal Optimization
Sebastian Dalleiger, Jilles Vreeken, Michael Kamp
TL;DR
This work tackles learning from privacy-sensitive distributed binary data by introducing Felb, a federated proximal-gradient method for Boolean matrix factorization. It relaxes Boolean constraints to continuous variables $U_i,V_i\in[0,1]$ and uses proximal aggregation to form a global binary core $\widehat{V}$, with two update variants Felb and Felb_mu and convergence guarantees. The authors prove global convergence and differential-privacy guarantees, and show through extensive synthetic and real-world experiments that Felb outperforms baselines in reconstruction quality and privacy-robustness. The approach enables scalable, privacy-preserving discovery of interpretable binary patterns across distributed domains such as genomics and recommender systems.
Abstract
Identifying informative components in binary data is an essential task in many research areas, including life sciences, social sciences, and recommendation systems. Boolean matrix factorization (BMF) is a family of methods that performs this task by efficiently factorizing the data. In real-world settings, the data is often distributed across stakeholders and required to stay private, prohibiting the straightforward application of BMF. To adapt BMF to this context, we approach the problem from a federated-learning perspective, while building on a state-of-the-art continuous binary matrix factorization relaxation to BMF that enables efficient gradient-based optimization. We propose to only share the relaxed component matrices, which are aggregated centrally using a proximal operator that regularizes for binary outcomes. We show the convergence of our federated proximal gradient descent algorithm and provide differential privacy guarantees. Our extensive empirical evaluation demonstrates that our algorithm outperforms, in terms of quality and efficacy, federation schemes of state-of-the-art BMF methods on a diverse set of real-world and synthetic data.
