Table of Contents
Fetching ...

Beyond Minimax Optimality: A Subgame Perfect Gradient Method

Benjamin Grimmer, Kevin Shu, Alex L. Wang

TL;DR

The paper tackles unconstrained smooth convex minimization from a dynamic perspective, recognizing that worst-case analyses can be overly pessimistic when first-order information is informative. It introduces SPGM, a Subgame Perfect Gradient Method, which refines OGM by exploiting the history of observed gradients and function values to improve convergence guarantees at each iteration. The key contributions include a formal dynamic optimality framework via subgame-perfect equilibria, a constructive SPGM algorithm with a computable per-iteration subproblem, a matching dynamic lower-bound construction, and a limited-memory variant with quantified storage and computational costs. Empirical results show SPGM often outperforms classical first-order methods, highlighting the practical potential of dynamic, history-aware optimization strategies and paving the way for subgame-perfect extensions to other gradient-based schemes.

Abstract

The study of convex optimization has historically been concerned with worst-case convergence rates. The development of the Optimized Gradient Method (OGM), due to \citet{drori2012PerformanceOF,Kim2016optimal}, marked a major milestone in this study, as OGM achieves the optimal worst-case convergence rate among all first-order methods for unconstrained smooth convex optimization. In order to examine the possibility of obtaining stronger convergence guarantees, we will consider algorithms with \emph{dynamic} convergence rates, which may improve as additional first-order information is revealed. Our main contribution is the development of an algorithm, the Subgame Perfect Gradient Method (SPGM), which refines OGM to make use of the full history of first-order information. We show that SPGM is \emph{dynamically optimal}, in the sense that in each iteration, no other algorithm can offer a strictly better convergence rate on all functions which agree with the observed first-order information up to that iteration. We formalize this notion of dynamic optimality using the game-theoretic notion of a subgame perfect equilibrium. We conclude our study with preliminary numerical experiments showing that SPGM strongly outperforms OGM.

Beyond Minimax Optimality: A Subgame Perfect Gradient Method

TL;DR

The paper tackles unconstrained smooth convex minimization from a dynamic perspective, recognizing that worst-case analyses can be overly pessimistic when first-order information is informative. It introduces SPGM, a Subgame Perfect Gradient Method, which refines OGM by exploiting the history of observed gradients and function values to improve convergence guarantees at each iteration. The key contributions include a formal dynamic optimality framework via subgame-perfect equilibria, a constructive SPGM algorithm with a computable per-iteration subproblem, a matching dynamic lower-bound construction, and a limited-memory variant with quantified storage and computational costs. Empirical results show SPGM often outperforms classical first-order methods, highlighting the practical potential of dynamic, history-aware optimization strategies and paving the way for subgame-perfect extensions to other gradient-based schemes.

Abstract

The study of convex optimization has historically been concerned with worst-case convergence rates. The development of the Optimized Gradient Method (OGM), due to \citet{drori2012PerformanceOF,Kim2016optimal}, marked a major milestone in this study, as OGM achieves the optimal worst-case convergence rate among all first-order methods for unconstrained smooth convex optimization. In order to examine the possibility of obtaining stronger convergence guarantees, we will consider algorithms with \emph{dynamic} convergence rates, which may improve as additional first-order information is revealed. Our main contribution is the development of an algorithm, the Subgame Perfect Gradient Method (SPGM), which refines OGM to make use of the full history of first-order information. We show that SPGM is \emph{dynamically optimal}, in the sense that in each iteration, no other algorithm can offer a strictly better convergence rate on all functions which agree with the observed first-order information up to that iteration. We formalize this notion of dynamic optimality using the game-theoretic notion of a subgame perfect equilibrium. We conclude our study with preliminary numerical experiments showing that SPGM strongly outperforms OGM.

Paper Structure

This paper contains 41 sections, 16 theorems, 119 equations, 5 figures, 3 algorithms.

Key Result

theorem 1

For any $0 \leq n \leq N$ and any set of first-order history ${\cal H}=\left\{(x_i,f_i,g_i)\right\}_{i\in[0,n-1]}$ generated by SPGM, the output $x_N$ of running SPGM for $N-n+1$ additional iterations satisfies where $\tau_{n,N}$ is a quantity computed by SPGM and depends on ${\cal H}$. Additionally, if $d\geq N+2$, then there exists an $L$-smooth convex function $f:{\mathbb{R}}^d\to{\mathbb{R}}$

Figures (5)

  • Figure 1: Left: The first five iterates of OGM on $f(x) =Lx^2/2$ with $x_0 = 1$. OGM produces $x_4 \approx 0.304$. Right: The first three iterates of SPGM on $f(x) =Lx^2/2$ with $x_0 = 1$. After seeing the history ${\cal H}=\left\{(x_0,f_0,g_0),(x_1,f_1,g_1)\right\}$, SPGM determines that $0$ is a minimizer for any$L$-smooth convex function agreeing with ${\cal H}$. In effect, the history ${\cal H}$ completely determines the function $f$ on the interval $[x_0,x_1]$.
  • Figure 2: Two representative plots of convergence of $(f(x_n)-f(x_\star))/\frac{L}{2}\|x_0-x_\star\|^2$ for the considered methods. Left: A randomly generated instance of minimizing the logSumExp function \ref{['eq:logSumExp']} with dimensions $d=256,m=1024$. Right: An instance of the logistic regression \ref{['eq:logisticRegression']} using the LIBSVM dataset "ionosphere" with $d=34,m=351$. On these instances, the limited memory versions of SPGM and BFGS had nearly identical convergence to their full memory counterparts and are omitted.
  • Figure 3: Two representative plots of $1/\tau_{n,N}$ as a function of $n$ as SPGM-10 and SPGM are run on Least Squares Regression problems of dimension $d=512,m=2048$. OGM displays its constant guarantee, $1/\tau_{0,N}$, providing the baseline, non-adaptive guarantee. Left instance is of the form \ref{['eq:BasicLeastSquares']} with $N=300$ and right instance is of the form \ref{['eq:LSR_HuberL1']} with $N=30$.
  • Figure 4: Performance comparison over $42$ randomly generated instances for the problems \ref{['eq:BasicLeastSquares']}--\ref{['eq:MoreauMax']}. From left to right, the target accuracy ranges as $10^{-3},\ 10^{-6},\ 10^{-9}$.
  • Figure 5: Performance comparison over twelve instances derived from LIBSVM data for the regression problems \ref{['eq:LSR_HuberL1']} and \ref{['eq:logisticRegression']}. From left to right, the target accuracy ranges as $10^{-3},10^{-6},10^{-9}$. Often, the performance of SPGM-10 matched that of SPGM with full memory, and hence, the lines overlap.

Theorems & Definitions (32)

  • theorem 1
  • remark 1
  • lemma 1: taylor2017interpolation
  • remark 2
  • remark 3
  • lemma 2
  • lemma 3
  • theorem 2
  • proof
  • theorem 3
  • ...and 22 more