Table of Contents
Fetching ...

Second-Order Min-Max Optimization with Lazy Hessians

Lesi Chen, Chengchang Liu, Jingzhao Zhang

TL;DR

This work introduces LEN, a second-order method for convex-concave minimax optimization that reuses Hessian information across iterations to reduce total computation. By solving a cubic-regularized Newton subproblem with a Hessian snapshot updated every $m$ iterations and then performing an extra gradient step, LEN achieves fast global convergence and improved oracle-complexity bounds, notably $\tilde{\mathcal{O}}((N+d^2)(d+ d^{2/3}\epsilon^{-2/3}))$ when $m=\Theta(d)$. The authors extend LEN to the strongly-convex-strongly-concave case via LEN-restart, obtaining $\tilde{\mathcal{O}}((N+d^2)(d+d^{2/3}\kappa^{2/3}))$ complexity, and provide detailed implementation strategies leveraging Schur decompositions and univariate root finding. Numerical experiments on regularized bilinear min-max and fairness-aware learning validate the efficiency and scalability of LEN and LEN-restart, demonstrating practical speedups over existing optimal second-order and first-order methods. Overall, this approach offers a principled, computation-efficient pathway to leverage second-order information in minimax problems, with provable improvements in total cost and robust performance on real-world tasks.

Abstract

This paper studies second-order methods for convex-concave minimax optimization. Monteiro and Svaiter (2012) proposed a method to solve the problem with an optimal iteration complexity of $\mathcal{O}(ε^{-3/2})$ to find an $ε$-saddle point. However, it is unclear whether the computational complexity, $\mathcal{O}((N+ d^2) d ε^{-2/3})$, can be improved. In the above, we follow Doikov et al. (2023) and assume the complexity of obtaining a first-order oracle as $N$ and the complexity of obtaining a second-order oracle as $dN$. In this paper, we show that the computation cost can be reduced by reusing Hessian across iterations. Our methods take the overall computational complexity of $ \tilde{\mathcal{O}}( (N+d^2)(d+ d^{2/3}ε^{-2/3}))$, which improves those of previous methods by a factor of $d^{1/3}$. Furthermore, we generalize our method to strongly-convex-strongly-concave minimax problems and establish the complexity of $\tilde{\mathcal{O}}((N+d^2) (d + d^{2/3} κ^{2/3}) )$ when the condition number of the problem is $κ$, enjoying a similar speedup upon the state-of-the-art method. Numerical experiments on both real and synthetic datasets also verify the efficiency of our method.

Second-Order Min-Max Optimization with Lazy Hessians

TL;DR

This work introduces LEN, a second-order method for convex-concave minimax optimization that reuses Hessian information across iterations to reduce total computation. By solving a cubic-regularized Newton subproblem with a Hessian snapshot updated every iterations and then performing an extra gradient step, LEN achieves fast global convergence and improved oracle-complexity bounds, notably when . The authors extend LEN to the strongly-convex-strongly-concave case via LEN-restart, obtaining complexity, and provide detailed implementation strategies leveraging Schur decompositions and univariate root finding. Numerical experiments on regularized bilinear min-max and fairness-aware learning validate the efficiency and scalability of LEN and LEN-restart, demonstrating practical speedups over existing optimal second-order and first-order methods. Overall, this approach offers a principled, computation-efficient pathway to leverage second-order information in minimax problems, with provable improvements in total cost and robust performance on real-world tasks.

Abstract

This paper studies second-order methods for convex-concave minimax optimization. Monteiro and Svaiter (2012) proposed a method to solve the problem with an optimal iteration complexity of to find an -saddle point. However, it is unclear whether the computational complexity, , can be improved. In the above, we follow Doikov et al. (2023) and assume the complexity of obtaining a first-order oracle as and the complexity of obtaining a second-order oracle as . In this paper, we show that the computation cost can be reduced by reusing Hessian across iterations. Our methods take the overall computational complexity of , which improves those of previous methods by a factor of . Furthermore, we generalize our method to strongly-convex-strongly-concave minimax problems and establish the complexity of when the condition number of the problem is , enjoying a similar speedup upon the state-of-the-art method. Numerical experiments on both real and synthetic datasets also verify the efficiency of our method.

Paper Structure

This paper contains 23 sections, 18 theorems, 79 equations, 2 figures, 1 table.

Key Result

Lemma 3.1

Under Assumptions asm:prob-lip-hes and asm:prob-cc, we have Furthermore, if Assumption asm:prob-scsc holds, we have ${\bm{F}}(\cdot)$ is $\mu$-strongly-monotone, i.e.

Figures (2)

  • Figure 1: We demonstrate running time v.s. gradient norm $\Vert {\bm{F}}({\bm{z}}) \Vert$ for Problem (\ref{['eq:cubic-toy']}) with different sizes: $n \in \{ 10, 100,200\}$.
  • Figure 2: We demonstrate running time v.s. gradient norm $\Vert {\bm{F}}({\bm{z}}) \Vert$ for fairness-aware machine learning task (Problem (\ref{['eq:fair']})) on datasets "heart", "adult", and "law school".

Theorems & Definitions (36)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Lemma 3.1: Lemma 2.7 lin2022explicit
  • Definition 3.4: nesterov2007dual
  • Definition 3.5
  • Remark 3.1
  • Lemma 4.1
  • Lemma 4.2
  • Theorem 4.1: C-C setting
  • ...and 26 more