Table of Contents
Fetching ...

Clipped SGD Algorithms for Performative Prediction: Tight Bounds for Clipping Bias and Remedies

Qiang Li, Michal Yemini, Hoi-To Wai

TL;DR

The paper analyzes clipped SGD in performative prediction where the data distribution depends on the current model, revealing a clipping-induced bias that interacts with distribution shifts. It shows that PCSGD converges to a neighborhood of the performative-stable solution, with a bias magnitude that scales as ${\cal O}(1/({\mu-L\beta})^2)$ in the strongly convex case and to a biased stationary point in the non-convex case at rates $O(1/t)$ and $O(1/\sqrt{T})$ respectively. To counteract this bias, the authors propose step-size tuning and adopt DiceSGD, which—via a two-clip mechanism and error feedback—can converge exactly to the PS/SPS solutions and, with DP, provide privacy guarantees. They also derive DP implications and provide numerical experiments confirming the theoretical bias-removal benefits and tradeoffs between privacy and performance. Overall, the work extends the understanding of clipped SGD under decision-dependent shifts and offers practical remedies for achieving stable performative optimization under privacy constraints.

Abstract

This paper studies the convergence of clipped stochastic gradient descent (SGD) algorithms with decision-dependent data distribution. Our setting is motivated by privacy preserving optimization algorithms that interact with performative data where the prediction models can influence future outcomes. This challenging setting involves the non-smooth clipping operator and non-gradient dynamics due to distribution shifts. We make two contributions in pursuit for a performative stable solution using clipped SGD algorithms. First, we characterize the clipping bias with projected clipped SGD (PCSGD) algorithm which is caused by the clipping operator that prevents PCSGD from reaching a stable solution. When the loss function is strongly convex, we quantify the lower and upper bounds for this clipping bias and demonstrate a bias amplification phenomenon with the sensitivity of data distribution. When the loss function is non-convex, we bound the magnitude of stationarity bias. Second, we propose remedies to mitigate the bias either by utilizing an optimal step size design for PCSGD, or to apply the recent DiceSGD algorithm [Zhang et al., 2024]. Our analysis is also extended to show that the latter algorithm is free from clipping bias in the performative setting. Numerical experiments verify our findings.

Clipped SGD Algorithms for Performative Prediction: Tight Bounds for Clipping Bias and Remedies

TL;DR

The paper analyzes clipped SGD in performative prediction where the data distribution depends on the current model, revealing a clipping-induced bias that interacts with distribution shifts. It shows that PCSGD converges to a neighborhood of the performative-stable solution, with a bias magnitude that scales as in the strongly convex case and to a biased stationary point in the non-convex case at rates and respectively. To counteract this bias, the authors propose step-size tuning and adopt DiceSGD, which—via a two-clip mechanism and error feedback—can converge exactly to the PS/SPS solutions and, with DP, provide privacy guarantees. They also derive DP implications and provide numerical experiments confirming the theoretical bias-removal benefits and tradeoffs between privacy and performance. Overall, the work extends the understanding of clipped SGD under decision-dependent shifts and offers practical remedies for achieving stable performative optimization under privacy constraints.

Abstract

This paper studies the convergence of clipped stochastic gradient descent (SGD) algorithms with decision-dependent data distribution. Our setting is motivated by privacy preserving optimization algorithms that interact with performative data where the prediction models can influence future outcomes. This challenging setting involves the non-smooth clipping operator and non-gradient dynamics due to distribution shifts. We make two contributions in pursuit for a performative stable solution using clipped SGD algorithms. First, we characterize the clipping bias with projected clipped SGD (PCSGD) algorithm which is caused by the clipping operator that prevents PCSGD from reaching a stable solution. When the loss function is strongly convex, we quantify the lower and upper bounds for this clipping bias and demonstrate a bias amplification phenomenon with the sensitivity of data distribution. When the loss function is non-convex, we bound the magnitude of stationarity bias. Second, we propose remedies to mitigate the bias either by utilizing an optimal step size design for PCSGD, or to apply the recent DiceSGD algorithm [Zhang et al., 2024]. Our analysis is also extended to show that the latter algorithm is free from clipping bias in the performative setting. Numerical experiments verify our findings.
Paper Structure (24 sections, 15 theorems, 115 equations, 4 figures, 1 algorithm)

This paper contains 24 sections, 15 theorems, 115 equations, 4 figures, 1 algorithm.

Key Result

Theorem 3

(Upper bound) Under Aass:scvx, assu:lips, assu:bndgrd, assu:w1. Suppose that $\beta <\frac{\mu}{L}$, the step sizes $\{\gamma_{t}\}_{t\geq 1}$ are non-increasing and satisfy i) $\frac{\gamma_{t-1}}{\gamma_{t}} \leq 1 + \frac{\mu -L\beta}{2}\gamma_t,$ and ii) $\gamma_{t} \leq \frac{2}{\mu - L\beta}$. where $c_1 \mathrel{\mathop:}= 2(c^2 + G^2) + d {\sigma_{\sf DP}^{2}}$, ${\cal C}_1 \mathrel{\matho

Figures (4)

  • Figure 1: Quadratic Minimization (First) The performative stability gap $\left\Vert {\bm \theta}_t - {\bm \theta}_{PS} \right\Vert^2$. (Second) Trade off between privacy budget $\varepsilon$ and bias. (Third) Bias amplification effect due to $\beta$.
  • Figure 2: Logistic Regression (First) Gap between iterations and performative stable point $\left\Vert {\bm \theta}_t - {\bm \theta}_{PS} \right\Vert^2$. (Second) Test true negative rate with shifted distribution. (Third) Test true positive accuracy with shifted distribution.
  • Figure 3: Behavior of $e_{t}$ with DiceSGD for quadratic minimization (left) and logistic regression (right).
  • Figure 4: Logistic Regression (Left): Performative Risk $V({\bm \theta})$. (Middle) & (Right): Train true neg./pos. rate.

Theorems & Definitions (27)

  • Definition 1
  • Definition 2
  • Remark 1
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Definition 6
  • Corollary 1
  • Corollary 2
  • Corollary 3
  • ...and 17 more