Table of Contents
Fetching ...

Exponential Convergence of (Stochastic) Gradient Descent for Separable Logistic Regression

Sacchit Kale, Piyushi Manupriya, Pierre Marion, Francis bach, Anant Raj

TL;DR

It is proved that gradient descent with a simple, non-adaptive increasing step-size schedule achieves exponential convergence for separable logistic regression under a margin condition, while remaining entirely within a stable optimization regime.

Abstract

Gradient descent and stochastic gradient descent are central to modern machine learning, yet their behavior under large step sizes remains theoretically unclear. Recent work suggests that acceleration often arises near the edge of stability, where optimization trajectories become unstable and difficult to analyze. Existing results for separable logistic regression achieve faster convergence by explicitly leveraging such unstable regimes through constant or adaptive large step sizes. In this paper, we show that instability is not inherent to acceleration. We prove that gradient descent with a simple, non-adaptive increasing step-size schedule achieves exponential convergence for separable logistic regression under a margin condition, while remaining entirely within a stable optimization regime. The resulting method is anytime and does not require prior knowledge of the optimization horizon or target accuracy. We also establish exponential convergence of stochastic gradient descent using a lightweight adaptive step-size rule that avoids line search and specialized procedures, improving upon existing polynomial-rate guarantees. Together, our results demonstrate that carefully structured step-size growth alone suffices to obtain exponential acceleration for both gradient descent and stochastic gradient descent.

Exponential Convergence of (Stochastic) Gradient Descent for Separable Logistic Regression

TL;DR

It is proved that gradient descent with a simple, non-adaptive increasing step-size schedule achieves exponential convergence for separable logistic regression under a margin condition, while remaining entirely within a stable optimization regime.

Abstract

Gradient descent and stochastic gradient descent are central to modern machine learning, yet their behavior under large step sizes remains theoretically unclear. Recent work suggests that acceleration often arises near the edge of stability, where optimization trajectories become unstable and difficult to analyze. Existing results for separable logistic regression achieve faster convergence by explicitly leveraging such unstable regimes through constant or adaptive large step sizes. In this paper, we show that instability is not inherent to acceleration. We prove that gradient descent with a simple, non-adaptive increasing step-size schedule achieves exponential convergence for separable logistic regression under a margin condition, while remaining entirely within a stable optimization regime. The resulting method is anytime and does not require prior knowledge of the optimization horizon or target accuracy. We also establish exponential convergence of stochastic gradient descent using a lightweight adaptive step-size rule that avoids line search and specialized procedures, improving upon existing polynomial-rate guarantees. Together, our results demonstrate that carefully structured step-size growth alone suffices to obtain exponential acceleration for both gradient descent and stochastic gradient descent.
Paper Structure (21 sections, 16 theorems, 184 equations, 4 figures)

This paper contains 21 sections, 16 theorems, 184 equations, 4 figures.

Key Result

Lemma 3.1

(Adapted from pmlr-v247-wu24b) Suppose $\mathcal{L}(\mathbf{w}_k)\le \frac{1}{\eta_k}$ holds $\forall k\in [s, t-1]$ for logistic loss, under Assumption assumption-gd, we have where $F(\mathbf{w}_{s})\coloneq \frac{1}{n}\sum_{i=1}^n\exp(-y_i\mathbf{x}_i^\top \mathbf{w}_{s})$.

Figures (4)

  • Figure 1: Comparison of our GD \ref{['eq:wgd']} and constant step-size gradient descent for logistic regression on a synthetic linearly separable dataset. The plot shows the evolution of the empirical logistic loss $\mathcal{L}(\mathbf{w}_t)$ (log scale) as a function of iterations $t$.
  • Figure 2: Dynamics of Gradient descent for logistic regression on a synthetic linearly separable dataset. Left: Evolution of the empirical loss $\mathcal{L}(\mathbf{w}_t)$ and inverse step size $1/\eta_t$ in log scale. Right: Plot of $\ln(S_t)$ versus $t^{1/3}$, validating order of growth of $\ln(S_t)$.
  • Figure 3: Average loss values with SGD \ref{['eq:sgd-update']} for logistic regression over a synthetically generated 10-dimensional data.
  • Figure 4: Average loss values with SGD \ref{['eq:sgd-update']} for logistic regression over an MNIST subset for different pair of labels: (a) Label '4' vs '9'. (b) Label '0' vs '8'. (c) Label '2' vs '6'. (d) Label '1' vs '5'.

Theorems & Definitions (31)

  • Lemma 3.1
  • Theorem 3.2
  • proof : Proof sketch.
  • Theorem 3.3
  • proof : Proof Sketch.
  • Theorem 3.4
  • proof : Proof Sketch
  • Lemma A.1
  • proof
  • Lemma A.2
  • ...and 21 more