Table of Contents
Fetching ...

Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics

Deep Patel, Emmanouil-Vasileios Vlatakis-Gkaragkounis

TL;DR

This work provides the first non-asymptotic convergence guarantees for solving neural min-max games by exploiting hidden convexity and overparameterization. It introduces AltGDA with a path-length/Lyapunov analysis that ensures global convergence to epsilon-Nash equilibria in broad hidden convex-concave settings, requiring wide two-layer networks and favorable initialization. The results cover both input-optimization games (randomly initialized fixed mappings) and neural-parameter games (trainable networks), with explicit width scaling and spectral conditions tied to Jacobian conditioning. Regularization and data geometry play crucial roles in stabilizing dynamics and enabling Polyak–Łojasiewicz-type convergence. These insights guide architectural and optimization choices for scalable, reliable multi-agent learning systems in adversarial and robust contexts.

Abstract

Many emerging applications - such as adversarial training, AI alignment, and robust optimization - can be framed as zero-sum games between neural nets, with von Neumann-Nash equilibria (NE) capturing the desirable system behavior. While such games often involve non-convex non-concave objectives, empirical evidence shows that simple gradient methods frequently converge, suggesting a hidden geometric structure. In this paper, we provide a theoretical framework that explains this phenomenon through the lens of hidden convexity and overparameterization. We identify sufficient conditions - spanning initialization, training dynamics, and network width - that guarantee global convergence to a NE in a broad class of non-convex min-max games. To our knowledge, this is the first such result for games that involve two-layer neural networks. Technically, our approach is twofold: (a) we derive a novel path-length bound for the alternating gradient descent-ascent scheme in min-max games; and (b) we show that the reduction from a hidden convex-concave geometry to two-sided Polyak-Łojasiewicz (PŁ) min-max condition hold with high probability under overparameterization, using tools from random matrix theory.

Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics

TL;DR

This work provides the first non-asymptotic convergence guarantees for solving neural min-max games by exploiting hidden convexity and overparameterization. It introduces AltGDA with a path-length/Lyapunov analysis that ensures global convergence to epsilon-Nash equilibria in broad hidden convex-concave settings, requiring wide two-layer networks and favorable initialization. The results cover both input-optimization games (randomly initialized fixed mappings) and neural-parameter games (trainable networks), with explicit width scaling and spectral conditions tied to Jacobian conditioning. Regularization and data geometry play crucial roles in stabilizing dynamics and enabling Polyak–Łojasiewicz-type convergence. These insights guide architectural and optimization choices for scalable, reliable multi-agent learning systems in adversarial and robust contexts.

Abstract

Many emerging applications - such as adversarial training, AI alignment, and robust optimization - can be framed as zero-sum games between neural nets, with von Neumann-Nash equilibria (NE) capturing the desirable system behavior. While such games often involve non-convex non-concave objectives, empirical evidence shows that simple gradient methods frequently converge, suggesting a hidden geometric structure. In this paper, we provide a theoretical framework that explains this phenomenon through the lens of hidden convexity and overparameterization. We identify sufficient conditions - spanning initialization, training dynamics, and network width - that guarantee global convergence to a NE in a broad class of non-convex min-max games. To our knowledge, this is the first such result for games that involve two-layer neural networks. Technically, our approach is twofold: (a) we derive a novel path-length bound for the alternating gradient descent-ascent scheme in min-max games; and (b) we show that the reduction from a hidden convex-concave geometry to two-sided Polyak-Łojasiewicz (PŁ) min-max condition hold with high probability under overparameterization, using tools from random matrix theory.

Paper Structure

This paper contains 43 sections, 21 theorems, 119 equations, 2 figures, 1 table.

Key Result

Lemma 2.7

If the objective function $f$ satisfies the two-sided PŁ-condition, then all three notions in Definition def:sol-concept are equivalent:

Figures (2)

  • Figure 1: Illustration of a maze environment where each agent must reason over a vast space of action sequences. Instead of explicitly constructing and searching the full decision tree, a neural network implicitly encodes both the value of paths and the policy for navigation, learning an effective strategy dynamically without ever uncovering the complete structure of the maze.
  • Figure 2: A trajectory of AltGDA in an $\ell_2$-regularized hidden game of Rock-Paper-Scissors. These trajectories correspond to each player’s strategies in the latent space (2-dimensional simplex).

Theorems & Definitions (45)

  • Remark 2.2
  • Definition 2.3: Two-layer Neural Network
  • Lemma 2.7: Lemma 2.1 in yang2020global, Appendix C in kalogiannis2025solving
  • Definition 3.1: Lyapunov Potential yang2020global
  • Lemma 3.1: Theorem 3.2 in yang2020global
  • Lemma 3.1: Upper Bound on Initial Potential ( P_0 )
  • Lemma 3.2
  • Theorem 3.3
  • Lemma 3.5: Lemma 3 & Appendix E.1--E.4 in song2021subquadratic
  • Theorem 3.6: $\prod$ Games with AltGDA
  • ...and 35 more