Table of Contents
Fetching ...

Enhancing Federated Learning Privacy with QUBO

Andras Ferenczi, Sutapa Samanta, Dagen Wang, Todd Hodges

TL;DR

This paper addresses the privacy risks inherent in federated learning by reducing per-round exposure of client updates through a QUBO-based client selection strategy. By formulating the selection as a quadratic binary optimization problem, it efficiently balances client relevance and redundancy, and introduces ten strategies to navigate exploration–exploitation trade-offs. Empirical results on MNIST (300 clients) and CINIC-10 (30 clients) show substantial per-round privacy improvements (up to 95.2% on MNIST) with minimal or no loss in accuracy, and robustness to data heterogeneity. The work demonstrates a quantum-inspired, privacy-preserving approach to FL that can complement differential privacy and potentially scale with quantum or quantum-inspired hardware, offering practical privacy gains in distributed ML settings.

Abstract

Federated learning (FL) is a widely used method for training machine learning (ML) models in a scalable way while preserving privacy (i.e., without centralizing raw data). Prior research shows that the risk of exposing sensitive data increases cumulatively as the number of iterations where a client's updates are included in the aggregated model increase. Attackers can launch membership inference attacks (MIA; deciding whether a sample or client participated), property inference attacks (PIA; inferring attributes of a client's data), and model inversion attacks (MI; reconstructing inputs), thereby inferring client-specific attributes and, in some cases, reconstructing inputs. In this paper, we mitigate risk by substantially reducing per client exposure using a quantum computing-inspired quadratic unconstrained binary optimization (QUBO) formulation that selects a small subset of client updates most relevant for each training round. In this work, we focus on two threat vectors: (i) information leakage by clients during training and (ii) adversaries who can query or obtain the global model. We assume a trusted central server and do not model server compromise. This method also assumes that the server has access to a validation/test set with global data distribution. Experiments on the MNIST dataset with 300 clients in 20 rounds showed a 95.2% per-round and 49% cumulative privacy exposure reduction, with 147 clients' updates never being used during training while maintaining in general the full-aggregation accuracy or even better. The method proved to be efficient at lower scale and more complex model as well. A CINIC-10 dataset-based experiment with 30 clients resulted in 82% per-round privacy improvement and 33% cumulative privacy.

Enhancing Federated Learning Privacy with QUBO

TL;DR

This paper addresses the privacy risks inherent in federated learning by reducing per-round exposure of client updates through a QUBO-based client selection strategy. By formulating the selection as a quadratic binary optimization problem, it efficiently balances client relevance and redundancy, and introduces ten strategies to navigate exploration–exploitation trade-offs. Empirical results on MNIST (300 clients) and CINIC-10 (30 clients) show substantial per-round privacy improvements (up to 95.2% on MNIST) with minimal or no loss in accuracy, and robustness to data heterogeneity. The work demonstrates a quantum-inspired, privacy-preserving approach to FL that can complement differential privacy and potentially scale with quantum or quantum-inspired hardware, offering practical privacy gains in distributed ML settings.

Abstract

Federated learning (FL) is a widely used method for training machine learning (ML) models in a scalable way while preserving privacy (i.e., without centralizing raw data). Prior research shows that the risk of exposing sensitive data increases cumulatively as the number of iterations where a client's updates are included in the aggregated model increase. Attackers can launch membership inference attacks (MIA; deciding whether a sample or client participated), property inference attacks (PIA; inferring attributes of a client's data), and model inversion attacks (MI; reconstructing inputs), thereby inferring client-specific attributes and, in some cases, reconstructing inputs. In this paper, we mitigate risk by substantially reducing per client exposure using a quantum computing-inspired quadratic unconstrained binary optimization (QUBO) formulation that selects a small subset of client updates most relevant for each training round. In this work, we focus on two threat vectors: (i) information leakage by clients during training and (ii) adversaries who can query or obtain the global model. We assume a trusted central server and do not model server compromise. This method also assumes that the server has access to a validation/test set with global data distribution. Experiments on the MNIST dataset with 300 clients in 20 rounds showed a 95.2% per-round and 49% cumulative privacy exposure reduction, with 147 clients' updates never being used during training while maintaining in general the full-aggregation accuracy or even better. The method proved to be efficient at lower scale and more complex model as well. A CINIC-10 dataset-based experiment with 30 clients resulted in 82% per-round privacy improvement and 33% cumulative privacy.

Paper Structure

This paper contains 22 sections, 11 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Privacy-preserving training accuracy comparison between FedAvg, QUBO, and Random approaches across 20 rounds, averaged over 20 runs with different data heterogeneity levels ($\alpha$ values). QUBO maintains competitive accuracy while providing 95.2% privacy preservation (in average, 13.95/300 clients expose gradients per round).
  • Figure 2: Privacy-preserving training loss evolution for FedAvg and QUBO approaches over 20 rounds.
  • Figure 3: MNIST maximum accuracy achieved under different data heterogeneity levels ($\alpha$ values) with privacy preservation. QUBO-based selection excels at all heterogeneity levels while maintaining 95.2% per-round privacy preservation, demonstrating optimal privacy-utility tradeoffs even in challenging non-IID scenarios.
  • Figure 4: MNIST gradient variance comparison under privacy constraints.
  • Figure 5: MNIST QUBO strategy selection across training rounds. Our strategy selection scoring \ref{['eq:scoring']} favors exploration over exploitation for optimal privacy-utility balance.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Definition 1: Privacy-preservation proxy metrics