Table of Contents
Fetching ...

Data Deletion for Linear Regression with Noisy SGD

Zhangjie Xia, Chi-Hua Wang, Guang Cheng

TL;DR

The perfect deleted point problem for 1-step noisy SGD in the classical linear regression task, which aims to find the perfect deleted point in the training dataset such that the model resulted from the deleted dataset will be identical to the one trained without deleting it is presented.

Abstract

In the current era of big data and machine learning, it's essential to find ways to shrink the size of training dataset while preserving the training performance to improve efficiency. However, the challenge behind it includes providing practical ways to find points that can be deleted without significantly harming the training result and suffering from problems like underfitting. We therefore present the perfect deleted point problem for 1-step noisy SGD in the classical linear regression task, which aims to find the perfect deleted point in the training dataset such that the model resulted from the deleted dataset will be identical to the one trained without deleting it. We apply the so-called signal-to-noise ratio and suggest that its value is closely related to the selection of the perfect deleted point. We also implement an algorithm based on this and empirically show the effectiveness of it in a synthetic dataset. Finally we analyze the consequences of the perfect deleted point, specifically how it affects the training performance and privacy budget, therefore highlighting its potential. This research underscores the importance of data deletion and calls for urgent need for more studies in this field.

Data Deletion for Linear Regression with Noisy SGD

TL;DR

The perfect deleted point problem for 1-step noisy SGD in the classical linear regression task, which aims to find the perfect deleted point in the training dataset such that the model resulted from the deleted dataset will be identical to the one trained without deleting it is presented.

Abstract

In the current era of big data and machine learning, it's essential to find ways to shrink the size of training dataset while preserving the training performance to improve efficiency. However, the challenge behind it includes providing practical ways to find points that can be deleted without significantly harming the training result and suffering from problems like underfitting. We therefore present the perfect deleted point problem for 1-step noisy SGD in the classical linear regression task, which aims to find the perfect deleted point in the training dataset such that the model resulted from the deleted dataset will be identical to the one trained without deleting it. We apply the so-called signal-to-noise ratio and suggest that its value is closely related to the selection of the perfect deleted point. We also implement an algorithm based on this and empirically show the effectiveness of it in a synthetic dataset. Finally we analyze the consequences of the perfect deleted point, specifically how it affects the training performance and privacy budget, therefore highlighting its potential. This research underscores the importance of data deletion and calls for urgent need for more studies in this field.

Paper Structure

This paper contains 25 sections, 6 theorems, 21 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

(Upper bound of membership advantage for $\epsilon$-differentially private algorithm yeom2018privacyriskmachinelearning) Let A be an $\epsilon$-differentially private algorithm, then we have

Figures (3)

  • Figure 1: Synthetic Dataset
  • Figure 2: Distribution of model weights after 1, 10, 50 steps of noisy SGD using perfect deleted point, randomly deleted point and no deleted point in 100 iterations. The histogram plots the occurrence of model weights in each bin, and the shaded area is the kernel density estimate (KDE) of the distribution.
  • Figure 3: Distribution of model weights after 50 steps of noisy SGD using perfect deleted point when $\alpha=0.05$, with mean = 3.38091 and variance = 0.00077.

Theorems & Definitions (18)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 1
  • Definition 4
  • Definition 5
  • Definition 6
  • Lemma 2
  • proof
  • Lemma 3
  • ...and 8 more