Table of Contents
Fetching ...

Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization

Liang Zhang, Junchi Yang, Amin Karbasi, Niao He

TL;DR

This work demonstrates that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth conveX-concave minimax problems under various error-prone oracle settings.

Abstract

Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade-off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems under various error-prone oracle settings. Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization. With the inexact gradient oracle, the near-optimal guarantees also hold for minimax optimization. Additionally, with the stochastic gradient oracle, we show that stochastic gradient descent ascent is optimal in terms of both reproducibility and gradient complexity. We believe our results contribute to an enhanced understanding of the reproducibility-convergence trade-off in the context of convex optimization.

Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization

TL;DR

This work demonstrates that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth conveX-concave minimax problems under various error-prone oracle settings.

Abstract

Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade-off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems under various error-prone oracle settings. Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization. With the inexact gradient oracle, the near-optimal guarantees also hold for minimax optimization. Additionally, with the stochastic gradient oracle, we show that stochastic gradient descent ascent is optimal in terms of both reproducibility and gradient complexity. We believe our results contribute to an enhanced understanding of the reproducibility-convergence trade-off in the context of convex optimization.
Paper Structure (44 sections, 32 theorems, 144 equations, 2 figures, 2 tables, 5 algorithms)

This paper contains 44 sections, 32 theorems, 144 equations, 2 figures, 2 tables, 5 algorithms.

Key Result

Lemma 3.2

Let $x_r^*=\arg\min_{x\in\mathcal{X}} \{F(x) + (r/2)\lVert x-x_0\rVert^2\}$ and $(x_r^*)'=\arg\min_{x\in\mathcal{X}} \{F(x) + (r/2)\lVert x - x_0'\rVert^2\}$. When $F$ is convex, it holds that $\lVert x_r^* - (x_r^*)'\rVert^2 \leq \lVert x_0 - x_0'\rVert^2$ for any $r > 0$.

Figures (2)

  • Figure 1: Comparisons among GD, AGD, and their regularized version on the quadratic minimization problem with $\delta$-inexact gradients. The left figure plots the convergence behavior and the right shows the reproducibility. Both axes are plotted utilizing a logarithmic scale.
  • Figure 2: Comparisons among GDA, EG, and their regularized version on the bilinear matrix game with $\delta$-inexact gradients. The left figure plots the convergence behavior and the right shows the reproducibility. Both axes are plotted utilizing a logarithmic scale.

Theorems & Definitions (73)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Lemma 3.2
  • Theorem 3.3
  • Remark 1
  • Proposition 3.4
  • ...and 63 more