Table of Contents
Fetching ...

Federated Binary Matrix Factorization using Proximal Optimization

Sebastian Dalleiger, Jilles Vreeken, Michael Kamp

TL;DR

This work tackles learning from privacy-sensitive distributed binary data by introducing Felb, a federated proximal-gradient method for Boolean matrix factorization. It relaxes Boolean constraints to continuous variables $U_i,V_i\in[0,1]$ and uses proximal aggregation to form a global binary core $\widehat{V}$, with two update variants Felb and Felb_mu and convergence guarantees. The authors prove global convergence and differential-privacy guarantees, and show through extensive synthetic and real-world experiments that Felb outperforms baselines in reconstruction quality and privacy-robustness. The approach enables scalable, privacy-preserving discovery of interpretable binary patterns across distributed domains such as genomics and recommender systems.

Abstract

Identifying informative components in binary data is an essential task in many research areas, including life sciences, social sciences, and recommendation systems. Boolean matrix factorization (BMF) is a family of methods that performs this task by efficiently factorizing the data. In real-world settings, the data is often distributed across stakeholders and required to stay private, prohibiting the straightforward application of BMF. To adapt BMF to this context, we approach the problem from a federated-learning perspective, while building on a state-of-the-art continuous binary matrix factorization relaxation to BMF that enables efficient gradient-based optimization. We propose to only share the relaxed component matrices, which are aggregated centrally using a proximal operator that regularizes for binary outcomes. We show the convergence of our federated proximal gradient descent algorithm and provide differential privacy guarantees. Our extensive empirical evaluation demonstrates that our algorithm outperforms, in terms of quality and efficacy, federation schemes of state-of-the-art BMF methods on a diverse set of real-world and synthetic data.

Federated Binary Matrix Factorization using Proximal Optimization

TL;DR

This work tackles learning from privacy-sensitive distributed binary data by introducing Felb, a federated proximal-gradient method for Boolean matrix factorization. It relaxes Boolean constraints to continuous variables and uses proximal aggregation to form a global binary core , with two update variants Felb and Felb_mu and convergence guarantees. The authors prove global convergence and differential-privacy guarantees, and show through extensive synthetic and real-world experiments that Felb outperforms baselines in reconstruction quality and privacy-robustness. The approach enables scalable, privacy-preserving discovery of interpretable binary patterns across distributed domains such as genomics and recommender systems.

Abstract

Identifying informative components in binary data is an essential task in many research areas, including life sciences, social sciences, and recommendation systems. Boolean matrix factorization (BMF) is a family of methods that performs this task by efficiently factorizing the data. In real-world settings, the data is often distributed across stakeholders and required to stay private, prohibiting the straightforward application of BMF. To adapt BMF to this context, we approach the problem from a federated-learning perspective, while building on a state-of-the-art continuous binary matrix factorization relaxation to BMF that enables efficient gradient-based optimization. We propose to only share the relaxed component matrices, which are aggregated centrally using a proximal operator that regularizes for binary outcomes. We show the convergence of our federated proximal gradient descent algorithm and provide differential privacy guarantees. Our extensive empirical evaluation demonstrates that our algorithm outperforms, in terms of quality and efficacy, federation schemes of state-of-the-art BMF methods on a diverse set of real-world and synthetic data.
Paper Structure (33 sections, 8 theorems, 36 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 33 sections, 8 theorems, 36 equations, 11 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

For the sequence generated by Alg. alg:feddc$\{z^t \triangleq (\{U^t_i\}_i, \{V^t_i\}_i, \bar{V}^t) \}_{k\in\mathbb{N}}$, the objective function $\Phi(z^t)$ converges to a stable solution $\Phi(z^t) \to \widehat{\Phi}$ if $t \to \infty$.

Figures (11)

  • Figure 1: Our method reconstructs the input well. Representing $1$s as black pixels, for (a) Asso using logical or and (b) our novel federated factorization called Felb, we show (top row) the client-data subjected to additive noise, (middle row) the localized reconstructions, and (bottom row) the aggregation-based reconstructions. The left-most column shows centralized combination of the data resp. reconstructions of the five clients (columns 2--6).
  • Figure 2: Felb and Felb$^\textsc{mu}$ are robust against noise. We show the loss, recall, similarity, and elapsed runtime ($s/C$) for synthetic data with varying levels of destructive XOR noise.
  • Figure 3: Felb and Felb$^\textsc{mu}$ perform well across various client counts, showing RMSD and runtime ($s/C$). For data scarcity, we fix the data size and an increase number of clients. For data abundance we grow data while increasing the number of clients.
  • Figure 4: Felb and Felb$^\textsc{mu}$ achieve accurate yet differentially-private reconstructions. For synthetic data, we subject algorithms to different noise mechanisms: Bernoulli, Laplacian, and Gaussian noise.
  • Figure 5: Felb and Felb$^\textsc{mu}$ perform similarly when we synchronize clients frequently, while Felb$^\textsc{mu}$ tends to improve over Felb if we rarely synchronize. We show the relative RMSD on real-world datasets with varying communication frequencies for Felb and Felb$^\textsc{mu}$.
  • ...and 6 more figures

Theorems & Definitions (17)

  • Definition 1: dwork2014algorithmic
  • Theorem 1: Convergence
  • proof
  • Theorem 2: Boolean Convergence
  • proof
  • Theorem 3: Convergence of Alg. \ref{['alg:feddc']} (restated)
  • proof
  • Theorem 4: Boolean Convergence of Alg. \ref{['alg:feddc']} (restated)
  • proof
  • Lemma 5: Convergence of client $i$ in Alg. \ref{['alg:feddc']}
  • ...and 7 more