Stochastic gradient descent for streaming linear and rectified linear systems with adversarial corruptions

Halyun Jeong; Deanna Needell; Elizaveta Rebrova

Stochastic gradient descent for streaming linear and rectified linear systems with adversarial corruptions

Halyun Jeong, Deanna Needell, Elizaveta Rebrova

TL;DR

This work addresses robust streaming regression for both linear and ReLU models under Massart semi-random corruptions. It introduces SGD-exp, an SGD variant with exponential decay step sizes, and proves nearly linear convergence: $\|\mathbf{x}-\mathbf{x}_T\|_2 \lesssim \log T \exp\left(-\frac{T}{d \log^2 T}\right)$ with high probability for corruption probability $p<\tfrac{1}{2}$, and extending to symmetric oblivious corruptions up to $p<1$. The analysis hinges on drift arguments for a transformed residual process, plus a drift-based MGF bound, delivering the first convergence guarantees for streaming robust ReLU regression under Massart noise. Experiments on synthetic and real datasets corroborate the theory, showing fast, robust recovery under various corruptions and demonstrating practical viability. The approach offers a practical and theoretically-grounded tool for robust streaming regression with broad implications for real-time learning under adversarial or semi-random data corruptions.

Abstract

We propose SGD-exp, a stochastic gradient descent approach for linear and ReLU regressions under Massart noise (adversarial semi-random corruption model) for the fully streaming setting. We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50\%$ Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is the first convergence guarantee result for robust ReLU regression in the streaming setting, and it shows the improved convergence rate over previous robust methods for $L_1$ linear regression due to a choice of an exponentially decaying step size, known for its efficiency in practice. Our analysis is based on the drift analysis of a discrete stochastic process, which could also be interesting on its own.

Stochastic gradient descent for streaming linear and rectified linear systems with adversarial corruptions

TL;DR

with high probability for corruption probability

, and extending to symmetric oblivious corruptions up to

. The analysis hinges on drift arguments for a transformed residual process, plus a drift-based MGF bound, delivering the first convergence guarantees for streaming robust ReLU regression under Massart noise. Experiments on synthetic and real datasets corroborate the theory, showing fast, robust recovery under various corruptions and demonstrating practical viability. The approach offers a practical and theoretically-grounded tool for robust streaming regression with broad implications for real-time learning under adversarial or semi-random data corruptions.

Abstract

Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is the first convergence guarantee result for robust ReLU regression in the streaming setting, and it shows the improved convergence rate over previous robust methods for

linear regression due to a choice of an exponentially decaying step size, known for its efficiency in practice. Our analysis is based on the drift analysis of a discrete stochastic process, which could also be interesting on its own.

Paper Structure (29 sections, 11 theorems, 66 equations, 8 figures)

This paper contains 29 sections, 11 theorems, 66 equations, 8 figures.

Introduction
Robust linear and ReLU regression
On corruption models
Contribution summary
Related works
SGD with exponential decay step size
Organization
Main results
Model 1: Streaming linear system
Model 2: Streaming ReLU regression
SGD-exp method and main theorems
Background on drift analysis
SGD-exp convergence analysis for linear problem
Initial reductions
SGD drift estimates
...and 14 more sections

Key Result

Theorem 1

Let $\mathbf{x} \in \mathbb{R}^d$ be an unknown vector, let $y_j = f(\langle \mathbf{x}, \mathbf{a}_j\rangle) + \epsilon_j$ for $j = 1, 2, \ldots$ represent streaming measurements of $\mathbf{x}$. Here, $\epsilon_j$ is the Massart noise with corruption probability $p < 0.5$, the measurement vectors where $f'$ is a subgradient of $f$ for $T$ iterations, then, with high probability, we have

Figures (8)

Figure 1: Relative error of the SGD-exp for (a) a corrupted linear system (left) and (b) a corrupted rectified linear (ReLU) system (right) with corruption probability $p = 0.4$. The blue curves represent relative error when the corruption is a large additive Gaussian noise. The red curves represent the sign-flip corruption, i.e., with probability $p$, the sign of measurement is flipped. The measurement vectors are $100$-dimensional i.i.d. normalized standard Gaussian vectors. The dimension of the signal is $100$ and both plots are averaged over $20$ trials.
Figure 1: Linear regression with SGD-exp and SGD-root on streaming Gaussian data with corruption probability $p = 0.4$ (sign-flip corruptions). For SGD-exp, step size scales $\lambda = 1.00003$, $\lambda = 1.000006$, $\lambda = 1.0000012$. Larger $\lambda$ results in faster convergence but even very small $\lambda$ are more efficient than SGD-root in the long run. The plots are averaged over $20$ runs.
Figure 2: Linear regression with SGD-exp on streaming Gaussian data with different values of the corruption probability $p$ (sign-flip corruptions). With step size $\lambda = 1.00005$ (left) the error of SGD-exp converges linearly for $p \le 0.45$. With more conservative step size $\lambda = 1.00001$ (right) the error of SGD-exp converges linearly for $p \le 0.475$. The plots are averaged over $10$ runs.
Figure 3: Linear regression with SGD-exp on streaming Gaussian data with step-size $\lambda$ recommended by Theorem \ref{['thm:main_convergence']}. The error of SGD-exp converges approximately linearly for $p \le 0.9$ for the symmetric random oblivious corruption model (left) for $p \le 0.475$ for the sign-flip error (right). The plots are averaged over $10$ runs.
Figure 4: Stochastic GLM-Tron in the streaming setting with no corruption ($p=0$) offers recovery of the true parameter for ReLU regression with both square root decaying and exponentially decaying step size scheduling (left). The number of samples in the data set is denoted by $m$. Here, GLM-Tron-exp employs exponentially decaying step size $1.00003^{-k}/m$, GLM-Tron-const has step size $1/m$ and GLM-Tron-root has step size $k^{-1/2}/m$. However, with sign corruption with $p = 0.4$ (right), stochastic GLM-Tron struggles with various step size choices (the error curves are similar). In contrast, the error of ReLU-SGD-exp converges linearly in robust ReLU regression with sign corruption. The plots are averages over $10$ runs.
...and 3 more figures

Theorems & Definitions (31)

Theorem 1
Lemma 2
Proof 1
Remark 1: General covariance case
Remark 2
Remark 3
Theorem 3
Remark 4
Remark 5: On the choice of $\lambda$
Theorem 4
...and 21 more

Stochastic gradient descent for streaming linear and rectified linear systems with adversarial corruptions

TL;DR

Abstract

Stochastic gradient descent for streaming linear and rectified linear systems with adversarial corruptions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (31)