Table of Contents
Fetching ...

A Framework for Finding Local Saddle Points in Two-Player Zero-Sum Black-Box Games

Shubhankar Agarwal, Hamzah I. Khan, Sandeep P. Chinchali, David Fridovich-Keil

TL;DR

This work tackles the challenge of finding local saddle points in unknown, nonconvex-nonconcave two-player zero-sum games using only zeroth-order samples. It introduces a two-level Bayesian optimization framework: a high-level GP surrogate refines the unknown objective by sampling at carefully chosen points, and a low-level general-sum game on the GP model identifies local Nash points to guide sampling. The authors develop LLGame, a Newton-based solver for the low-level game, and BSP, a high-level procedure that iteratively samples and updates the GP until a local saddle point is certified via first- and second-order conditions, with multiple variants to balance exploration, exploitation, and sampling cost. Experiments on synthetic benchmarks and ARIMA-MPC settings show the approach can outperform baselines and provide robustness advantages, including improved out-of-distribution performance in a robust MPC context. The framework offers a flexible, extensible template for black-box saddle-point optimization with zeroth-order data, highlighting both practical utility and avenues for future theoretical and scalability enhancements.

Abstract

Saddle point optimization is a critical problem employed in numerous real-world applications, including portfolio optimization, generative adversarial networks, and robotics. It has been extensively studied in cases where the objective function is known and differentiable. Existing work in black-box settings with unknown objectives that can only be sampled either assumes convexity-concavity in the objective to simplify the problem or operates with noisy gradient estimators. In contrast, we introduce a framework inspired by Bayesian optimization which utilizes Gaussian processes to model the unknown (potentially nonconvex-nonconcave) objective and requires only zeroth-order samples. Our approach frames the saddle point optimization problem as a two-level process which can flexibly integrate existing and novel approaches to this problem. The upper level of our framework produces a model of the objective function by sampling in promising locations, and the lower level of our framework uses the existing model to frame and solve a general-sum game to identify locations to sample. This lower level procedure can be designed in complementary ways, and we demonstrate the flexibility of our approach by introducing variants which appropriately trade off between factors like runtime, the cost of function evaluations, and the number of available initial samples. We experimentally demonstrate these algorithms on synthetic and realistic datasets in black-box nonconvex-nonconcave settings, showcasing their ability to efficiently locate local saddle points in these contexts.

A Framework for Finding Local Saddle Points in Two-Player Zero-Sum Black-Box Games

TL;DR

This work tackles the challenge of finding local saddle points in unknown, nonconvex-nonconcave two-player zero-sum games using only zeroth-order samples. It introduces a two-level Bayesian optimization framework: a high-level GP surrogate refines the unknown objective by sampling at carefully chosen points, and a low-level general-sum game on the GP model identifies local Nash points to guide sampling. The authors develop LLGame, a Newton-based solver for the low-level game, and BSP, a high-level procedure that iteratively samples and updates the GP until a local saddle point is certified via first- and second-order conditions, with multiple variants to balance exploration, exploitation, and sampling cost. Experiments on synthetic benchmarks and ARIMA-MPC settings show the approach can outperform baselines and provide robustness advantages, including improved out-of-distribution performance in a robust MPC context. The framework offers a flexible, extensible template for black-box saddle-point optimization with zeroth-order data, highlighting both practical utility and avenues for future theoretical and scalability enhancements.

Abstract

Saddle point optimization is a critical problem employed in numerous real-world applications, including portfolio optimization, generative adversarial networks, and robotics. It has been extensively studied in cases where the objective function is known and differentiable. Existing work in black-box settings with unknown objectives that can only be sampled either assumes convexity-concavity in the objective to simplify the problem or operates with noisy gradient estimators. In contrast, we introduce a framework inspired by Bayesian optimization which utilizes Gaussian processes to model the unknown (potentially nonconvex-nonconcave) objective and requires only zeroth-order samples. Our approach frames the saddle point optimization problem as a two-level process which can flexibly integrate existing and novel approaches to this problem. The upper level of our framework produces a model of the objective function by sampling in promising locations, and the lower level of our framework uses the existing model to frame and solve a general-sum game to identify locations to sample. This lower level procedure can be designed in complementary ways, and we demonstrate the flexibility of our approach by introducing variants which appropriately trade off between factors like runtime, the cost of function evaluations, and the number of available initial samples. We experimentally demonstrate these algorithms on synthetic and realistic datasets in black-box nonconvex-nonconcave settings, showcasing their ability to efficiently locate local saddle points in these contexts.

Paper Structure

This paper contains 23 sections, 4 theorems, 23 equations, 5 figures, 2 tables, 2 algorithms.

Key Result

Proposition 3.3

ratliff2016characterization For differentiable $f_1$ and $f_2$, a local Nash point $(x^*, y^*)$ satisfies $\nabla_{x} f_1(x^*, y^*) = 0$ and $\nabla_{y} f_2(x^*, y^*) = 0$.

Figures (5)

  • Figure 1: Comparisons of selected algorithm variants with baselines: We compare variants of our proposed algorithms with baseline methods across two domains (rows), the decaying and high-dimension polynomials, landscapes of which are shown in the first column. The middle column considers test cases with a large number of initially sampled points, while the right column examines test cases with a limited number of initially sampled points. In each case, we report the value of (real) merit function $M^f$ vs. the number of underlying function evaluations. Key takeaway: Generally, Ef-Xplore converges faster with a large number of initial samples by taking multiple Newton steps at each step in order to exploit the accurate prior while Exp-Xploit exhibits quicker convergence with limited samples by taking single Newton steps to avoid unfavorable regions amid uncertainty. Finally, we find that Ef-Xplore and Exp-Xploit converge faster than all three baseline methods, indicating the benefit of the GP surrogate in improving convergence compared to baselines which are often unable to converge.
  • Figure 2: Saddle Point Optimization with bsp leads to Robust MPC on ood data: On the left, we display the MPC tracking of the timeseries generated by the ARIMA model at various iterations for the Ef-Xplore variant. The ARIMA target trajectory is depicted in purple, while the corresponding MPC tracking is illustrated in orange. Initially, MPC performs poorly (iteration 0), but gradually improves its tracking (iteration 17). Consequently, the ARIMA makes tracking more challenging for the MPC (iteration 31), until they both reach equilibrium (iteration 50). On the right, we compare the final robust MPC parameters (orange), obtained through our algorithm, to the nominal MPC parameters (blue) on in-distribution data (left column) and ood data (right column). Key takeaway: significantly, the robust MPC successfully identifies robust MPC parameters and achieves 27.6% lower mean MPC cost on OoD data compared to nominal MPC without reducing performance on in-distribution data.
  • Figure 3: The variance at the sampled points decreases over time.
  • Figure 4: Comparisons of our proposed algorithm variants: We compare all four variants of our proposed algorithms across the three experiments (each column) described in \ref{['sec:experiments']}. The horizontal axis denotes the number of underlying function evaluations, while the vertical axis represents the value of the real merit function, $M^f$. The top row considers test cases with a large number of initially sampled points, while the bottom row examines test cases with a limited number of initially sampled points. The key takeaway is that generally, exploit variants converge faster with a large number of initial samples due to effective utilization of accurate priors, while explore variants exhibit quicker convergence with limited samples by prioritizing exploration amid uncertainty. Efficient variants converge faster with many initial samples by taking multiple accurate Newton steps, while expensive variants show stable though often slower convergence with limited samples, taking single Newton steps to avoid unfavorable regions.
  • Figure : LLGame

Theorems & Definitions (9)

  • Definition 3.1: np for Two-Player General-Sum Game
  • Definition 3.2: lnp for Two-Player General-Sum Game
  • Proposition 3.3: First-order Necessary Condition
  • Proposition 3.4: Second-order Sufficient Condition
  • Remark 3.5: Nash Point is a Saddle Point when $f_1$ = $-f_2$
  • Lemma 4.1: Convergence to Local Nash Point in LLGame
  • Lemma A.1: Equality of $\mathrm{UCB}_t$ and $\mathrm{LCB}_t$ at sampled points under zero observation noise
  • proof
  • Remark A.2: Approximate equality of UCB and LCB at sampled points under observation noise