Table of Contents
Fetching ...

A Quantization-based Technique for Privacy Preserving Distributed Learning

Maurizio Colombo, Rasool Asal, Ernesto Damiani, Lamees Mahmoud AlQassem, Al Anoud Almemari, Yousof Alhammadi

TL;DR

The paper tackles data privacy in distributed ML under regulatory regimes by introducing Hash-Comb, a quantization-based data representation that protects both training data and model parameters. It combines randomized quantization with a multi-hash encoding and secures hyperparameters via MPC/secret sharing, enabling regulation-compliant distributed training across architectures. The authors prove that the scheme achieves Rényi differential privacy bounds and show through experiments on SPAM, IoT23, and Cardiovascular datasets that it can improve accuracy and convergence while reducing communication. Compared with classic DP noise baselines, Hash-Comb provides a favorable privacy-utility trade-off with a smaller practical footprint, and is suitable for both monolithic and federated learning lifecycles.

Abstract

The massive deployment of Machine Learning (ML) models raises serious concerns about data protection. Privacy-enhancing technologies (PETs) offer a promising first step, but hard challenges persist in achieving confidentiality and differential privacy in distributed learning. In this paper, we describe a novel, regulation-compliant data protection technique for the distributed training of ML models, applicable throughout the ML life cycle regardless of the underlying ML architecture. Designed from the data owner's perspective, our method protects both training data and ML model parameters by employing a protocol based on a quantized multi-hash data representation Hash-Comb combined with randomization. The hyper-parameters of our scheme can be shared using standard Secure Multi-Party computation protocols. Our experimental results demonstrate the robustness and accuracy-preserving properties of our approach.

A Quantization-based Technique for Privacy Preserving Distributed Learning

TL;DR

The paper tackles data privacy in distributed ML under regulatory regimes by introducing Hash-Comb, a quantization-based data representation that protects both training data and model parameters. It combines randomized quantization with a multi-hash encoding and secures hyperparameters via MPC/secret sharing, enabling regulation-compliant distributed training across architectures. The authors prove that the scheme achieves Rényi differential privacy bounds and show through experiments on SPAM, IoT23, and Cardiovascular datasets that it can improve accuracy and convergence while reducing communication. Compared with classic DP noise baselines, Hash-Comb provides a favorable privacy-utility trade-off with a smaller practical footprint, and is suitable for both monolithic and federated learning lifecycles.

Abstract

The massive deployment of Machine Learning (ML) models raises serious concerns about data protection. Privacy-enhancing technologies (PETs) offer a promising first step, but hard challenges persist in achieving confidentiality and differential privacy in distributed learning. In this paper, we describe a novel, regulation-compliant data protection technique for the distributed training of ML models, applicable throughout the ML life cycle regardless of the underlying ML architecture. Designed from the data owner's perspective, our method protects both training data and ML model parameters by employing a protocol based on a quantized multi-hash data representation Hash-Comb combined with randomization. The hyper-parameters of our scheme can be shared using standard Secure Multi-Party computation protocols. Our experimental results demonstrate the robustness and accuracy-preserving properties of our approach.
Paper Structure (21 sections, 18 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 18 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Hash-Comb levels as numbers of consecutive tosses of a biased coin.
  • Figure 2: Solution for p when $\overline{k} = 8$.
  • Figure 3: Some of the most relevant features in SPAM dataset
  • Figure 4: Training and Testing (last step) from Experiment 1
  • Figure 5: Model validation score at each FedAvg iteration from Experiment 2
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 1.1