A Quantization-based Technique for Privacy Preserving Distributed Learning
Maurizio Colombo, Rasool Asal, Ernesto Damiani, Lamees Mahmoud AlQassem, Al Anoud Almemari, Yousof Alhammadi
TL;DR
The paper tackles data privacy in distributed ML under regulatory regimes by introducing Hash-Comb, a quantization-based data representation that protects both training data and model parameters. It combines randomized quantization with a multi-hash encoding and secures hyperparameters via MPC/secret sharing, enabling regulation-compliant distributed training across architectures. The authors prove that the scheme achieves Rényi differential privacy bounds and show through experiments on SPAM, IoT23, and Cardiovascular datasets that it can improve accuracy and convergence while reducing communication. Compared with classic DP noise baselines, Hash-Comb provides a favorable privacy-utility trade-off with a smaller practical footprint, and is suitable for both monolithic and federated learning lifecycles.
Abstract
The massive deployment of Machine Learning (ML) models raises serious concerns about data protection. Privacy-enhancing technologies (PETs) offer a promising first step, but hard challenges persist in achieving confidentiality and differential privacy in distributed learning. In this paper, we describe a novel, regulation-compliant data protection technique for the distributed training of ML models, applicable throughout the ML life cycle regardless of the underlying ML architecture. Designed from the data owner's perspective, our method protects both training data and ML model parameters by employing a protocol based on a quantized multi-hash data representation Hash-Comb combined with randomization. The hyper-parameters of our scheme can be shared using standard Secure Multi-Party computation protocols. Our experimental results demonstrate the robustness and accuracy-preserving properties of our approach.
