Table of Contents
Fetching ...

Approximated Coded Computing: Towards Fast, Private and Secure Distributed Machine Learning

Houming Qiu, Kun Zhu, Nguyen Cong Luong, Dusit Niyato

TL;DR

SPACDC tackles fast, private, and secure distributed learning by combining approximated coded computing with an elliptic-curve-based secure transmission layer. It relaxes the strict recovery threshold of traditional CDC while offering information-theoretic privacy against colluding workers and protecting data in transit via the MEA-ECC scheme. The SPACDC-DL framework integrates these ideas into distributed deep learning, providing convergence guarantees and substantial speedups on MNIST under straggler and adversarial conditions. The approach has practical impact for privacy-preserving, scalable distributed ML with secure communication in CDC systems.

Abstract

In a large-scale distributed machine learning system, coded computing has attracted wide-spread attention since it can effectively alleviate the impact of stragglers. However, several emerging problems greatly limit the performance of coded distributed systems. Firstly, an existence of colluding workers who collude results with each other leads to serious privacy leakage issues. Secondly, there are few existing works considering security issues in data transmission of distributed computing systems. Thirdly, the number of required results for which need to wait increases with the degree of decoding functions. In this paper, we design a secure and private approximated coded distributed computing (SPACDC) scheme that deals with the above-mentioned problems simultaneously. Our SPACDC scheme guarantees data security during the transmission process using a new encryption algorithm based on elliptic curve cryptography. Especially, the SPACDC scheme does not impose strict constraints on the minimum number of results required to be waited for. An extensive performance analysis is conducted to demonstrate the effectiveness of our SPACDC scheme. Furthermore, we present a secure and private distributed learning algorithm based on the SPACDC scheme, which can provide information-theoretic privacy protection for training data. Our experiments show that the SPACDC-based deep learning algorithm achieves a significant speedup over the baseline approaches.

Approximated Coded Computing: Towards Fast, Private and Secure Distributed Machine Learning

TL;DR

SPACDC tackles fast, private, and secure distributed learning by combining approximated coded computing with an elliptic-curve-based secure transmission layer. It relaxes the strict recovery threshold of traditional CDC while offering information-theoretic privacy against colluding workers and protecting data in transit via the MEA-ECC scheme. The SPACDC-DL framework integrates these ideas into distributed deep learning, providing convergence guarantees and substantial speedups on MNIST under straggler and adversarial conditions. The approach has practical impact for privacy-preserving, scalable distributed ML with secure communication in CDC systems.

Abstract

In a large-scale distributed machine learning system, coded computing has attracted wide-spread attention since it can effectively alleviate the impact of stragglers. However, several emerging problems greatly limit the performance of coded distributed systems. Firstly, an existence of colluding workers who collude results with each other leads to serious privacy leakage issues. Secondly, there are few existing works considering security issues in data transmission of distributed computing systems. Thirdly, the number of required results for which need to wait increases with the degree of decoding functions. In this paper, we design a secure and private approximated coded distributed computing (SPACDC) scheme that deals with the above-mentioned problems simultaneously. Our SPACDC scheme guarantees data security during the transmission process using a new encryption algorithm based on elliptic curve cryptography. Especially, the SPACDC scheme does not impose strict constraints on the minimum number of results required to be waited for. An extensive performance analysis is conducted to demonstrate the effectiveness of our SPACDC scheme. Furthermore, we present a secure and private distributed learning algorithm based on the SPACDC scheme, which can provide information-theoretic privacy protection for training data. Our experiments show that the SPACDC-based deep learning algorithm achieves a significant speedup over the baseline approaches.
Paper Structure (33 sections, 3 theorems, 42 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 33 sections, 3 theorems, 42 equations, 7 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

Given a CDC system with a master and $N$ workers, the SPACDC-DL algorithm is applied to train a DNN model while satisfying Assumptions$1,2$ and $3$. Then, the SPACDC-DL guarantees where $\varphi$ is given in Assumption 3, $\Gamma$ represents the number of iterations, and $\mathbf{\Theta}^{(\ast)}$ is one of the optimal parameter.

Figures (7)

  • Figure 1: An illustration of a CDC system based on the proposed matrix encryption algorithm. Therein, the master aims to approximately compute a polynomial over a dataset $\mathbf{X}=[\mathbf{X}^T_0,\mathbf{X}^T_1,\ldots,\mathbf{X}^T_{K-1}]^T$.
  • Figure 2: Visualized example of a DNN with $L$ layers, each of which has $M_l$ neurons for $l=1,2,\ldots,L$.
  • Figure 3: Comparison of average training time achieved by the CONV-DL, MDS-DL, and MATDOT-DL algorithms and our proposed SPACDC-DL algorithm for training a DNN in distributed computing systems under parameters $N=30$ and $T=3$ while with $S=0, 3, 5,~\text{and}~7$ stragglers.
  • Figure 4: Comparison of test accuracy obtained by the CONV-DL, MDS-DL, and MATDOT-DL algorithms and our proposed SPACDC-DL algorithm in distributed computing systems with $N=30$, $T=3$, and $S=3,5,7$.
  • Figure 5: Comparison of decoding complexity obtained by the BACC scheme, LCC scheme, Polynomial codes, SecPoly codes, MatDot codes, and the SPACDC scheme in a CDC system with parameters $m=1000$, and $K$ values range from $1$ to $36$.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Theorem 2
  • Theorem 3