Approximated Coded Computing: Towards Fast, Private and Secure Distributed Machine Learning
Houming Qiu, Kun Zhu, Nguyen Cong Luong, Dusit Niyato
TL;DR
SPACDC tackles fast, private, and secure distributed learning by combining approximated coded computing with an elliptic-curve-based secure transmission layer. It relaxes the strict recovery threshold of traditional CDC while offering information-theoretic privacy against colluding workers and protecting data in transit via the MEA-ECC scheme. The SPACDC-DL framework integrates these ideas into distributed deep learning, providing convergence guarantees and substantial speedups on MNIST under straggler and adversarial conditions. The approach has practical impact for privacy-preserving, scalable distributed ML with secure communication in CDC systems.
Abstract
In a large-scale distributed machine learning system, coded computing has attracted wide-spread attention since it can effectively alleviate the impact of stragglers. However, several emerging problems greatly limit the performance of coded distributed systems. Firstly, an existence of colluding workers who collude results with each other leads to serious privacy leakage issues. Secondly, there are few existing works considering security issues in data transmission of distributed computing systems. Thirdly, the number of required results for which need to wait increases with the degree of decoding functions. In this paper, we design a secure and private approximated coded distributed computing (SPACDC) scheme that deals with the above-mentioned problems simultaneously. Our SPACDC scheme guarantees data security during the transmission process using a new encryption algorithm based on elliptic curve cryptography. Especially, the SPACDC scheme does not impose strict constraints on the minimum number of results required to be waited for. An extensive performance analysis is conducted to demonstrate the effectiveness of our SPACDC scheme. Furthermore, we present a secure and private distributed learning algorithm based on the SPACDC scheme, which can provide information-theoretic privacy protection for training data. Our experiments show that the SPACDC-based deep learning algorithm achieves a significant speedup over the baseline approaches.
