Table of Contents
Fetching ...

A Review on Machine Unlearning

Haibo Zhang, Toru Nakamura, Takamasa Isohara, Kouichi Sakurai

TL;DR

An in-depth review of the security and privacy concerns in machine learning models is provided and how to protect users’ privacy from being violated using machine learning platforms is discussed.

Abstract

Recently, an increasing number of laws have governed the useability of users' privacy. For example, Article 17 of the General Data Protection Regulation (GDPR), the right to be forgotten, requires machine learning applications to remove a portion of data from a dataset and retrain it if the user makes such a request. Furthermore, from the security perspective, training data for machine learning models, i.e., data that may contain user privacy, should be effectively protected, including appropriate erasure. Therefore, researchers propose various privacy-preserving methods to deal with such issues as machine unlearning. This paper provides an in-depth review of the security and privacy concerns in machine learning models. First, we present how machine learning can use users' private data in daily life and the role that the GDPR plays in this problem. Then, we introduce the concept of machine unlearning by describing the security threats in machine learning models and how to protect users' privacy from being violated using machine learning platforms. As the core content of the paper, we introduce and analyze current machine unlearning approaches and several representative research results and discuss them in the context of the data lineage. Furthermore, we also discuss the future research challenges in this field.

A Review on Machine Unlearning

TL;DR

An in-depth review of the security and privacy concerns in machine learning models is provided and how to protect users’ privacy from being violated using machine learning platforms is discussed.

Abstract

Recently, an increasing number of laws have governed the useability of users' privacy. For example, Article 17 of the General Data Protection Regulation (GDPR), the right to be forgotten, requires machine learning applications to remove a portion of data from a dataset and retrain it if the user makes such a request. Furthermore, from the security perspective, training data for machine learning models, i.e., data that may contain user privacy, should be effectively protected, including appropriate erasure. Therefore, researchers propose various privacy-preserving methods to deal with such issues as machine unlearning. This paper provides an in-depth review of the security and privacy concerns in machine learning models. First, we present how machine learning can use users' private data in daily life and the role that the GDPR plays in this problem. Then, we introduce the concept of machine unlearning by describing the security threats in machine learning models and how to protect users' privacy from being violated using machine learning platforms. As the core content of the paper, we introduce and analyze current machine unlearning approaches and several representative research results and discuss them in the context of the data lineage. Furthermore, we also discuss the future research challenges in this field.

Paper Structure

This paper contains 26 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The necessity of machine unlearning. The red arrow indicates that the attacker can access the training data or parameters of the machine learning model through malicious data injection or information stealing to obtain user privacy or even reconstruct the machine learning model. In this case, according to the orange arrow, the data owner will request to delete specific sensitive data, and the model owner needs to apply machine unlearning methods to remove the requested data.
  • Figure 2: CIA triad in machine learning.
  • Figure 3: The general machine learning system is consists of three stages, i.e. feature selection, model training and prediction.
  • Figure 4: Machine Retraining vs. Machine Unlearning
  • Figure 5: A typical machine learning pipeline consists of three primary stages, i.e., training, inference, and unlearning. First, the initial model $\textit{W}^*$ is trained on the initial dataset $\mathcal{D}_{init}$, and the output is used in the inference stage; afterward, once a request to delete the data $\mathcal{D}_{m}$ is received, the updated model $\textit{W}^u$ can be obtained through the unlearning stage, when the data set becomes $\mathcal{D} \setminus \mathcal{D}_{m}$. The process pointed by the red arrow is to apply the updated model $\textit{W}^u$ directly to the inference stage, i.e., approximate unlearning; the process pointed by the green arrow is to start retraining the initial model $\textit{W}^*$ on the new data set $\mathcal{D} \setminus \mathcal{D}_{m}$ from scratch, i.e., exact unlearning mahadevan2021certifiable.
  • ...and 2 more figures