Table of Contents
Fetching ...

Pareto-Optimal Algorithms for Learning in Games

Eshwar Ram Arunachaleswaran, Natalie Collina, Jon Schneider

TL;DR

This paper reframes learning in two-player repeated games with unknown payoffs through the lens of asymptotic Pareto-optimality. It introduces asymptotic menus to capture the long-run set of implementable strategy profiles against an optimizer, and leverages Blackwell approachability to characterize which menus can arise. The key findings show that no-swap-regret (NSR) algorithms yield a unique, minimal Pareto-optimal menu vs. the broader class of no-regret (NR) algorithms, but many NR menus remain Pareto-dominated; NSR algorithms are strategically equivalent in the asymptotic sense. The work also analyzes mean-based and FTRL-style algorithms, showing their menus can be Pareto-dominated in typical settings, while providing constructive guidance on designing learning algorithms by shaping their asymptotic menu. Overall, it offers a geometric, menu-centered framework for evaluating and constructing learning rules in strategic, unknown-payoff environments with implications for auction design and Stackelberg-like settings.

Abstract

We study the problem of characterizing optimal learning algorithms for playing repeated games against an adversary with unknown payoffs. In this problem, the first player (called the learner) commits to a learning algorithm against a second player (called the optimizer), and the optimizer best-responds by choosing the optimal dynamic strategy for their (unknown but well-defined) payoff. Classic learning algorithms (such as no-regret algorithms) provide some counterfactual guarantees for the learner, but might perform much more poorly than other learning algorithms against particular optimizer payoffs. In this paper, we introduce the notion of asymptotically Pareto-optimal learning algorithms. Intuitively, if a learning algorithm is Pareto-optimal, then there is no other algorithm which performs asymptotically at least as well against all optimizers and performs strictly better (by at least $Ω(T)$) against some optimizer. We show that well-known no-regret algorithms such as Multiplicative Weights and Follow The Regularized Leader are Pareto-dominated. However, while no-regret is not enough to ensure Pareto-optimality, we show that a strictly stronger property, no-swap-regret, is a sufficient condition for Pareto-optimality. Proving these results requires us to address various technical challenges specific to repeated play, including the fact that there is no simple characterization of how optimizers who are rational in the long-term best-respond against a learning algorithm over multiple rounds of play. To address this, we introduce the idea of the asymptotic menu of a learning algorithm: the convex closure of all correlated distributions over strategy profiles that are asymptotically implementable by an adversary. We show that all no-swap-regret algorithms share the same asymptotic menu, implying that all no-swap-regret algorithms are ``strategically equivalent''.

Pareto-Optimal Algorithms for Learning in Games

TL;DR

This paper reframes learning in two-player repeated games with unknown payoffs through the lens of asymptotic Pareto-optimality. It introduces asymptotic menus to capture the long-run set of implementable strategy profiles against an optimizer, and leverages Blackwell approachability to characterize which menus can arise. The key findings show that no-swap-regret (NSR) algorithms yield a unique, minimal Pareto-optimal menu vs. the broader class of no-regret (NR) algorithms, but many NR menus remain Pareto-dominated; NSR algorithms are strategically equivalent in the asymptotic sense. The work also analyzes mean-based and FTRL-style algorithms, showing their menus can be Pareto-dominated in typical settings, while providing constructive guidance on designing learning algorithms by shaping their asymptotic menu. Overall, it offers a geometric, menu-centered framework for evaluating and constructing learning rules in strategic, unknown-payoff environments with implications for auction design and Stackelberg-like settings.

Abstract

We study the problem of characterizing optimal learning algorithms for playing repeated games against an adversary with unknown payoffs. In this problem, the first player (called the learner) commits to a learning algorithm against a second player (called the optimizer), and the optimizer best-responds by choosing the optimal dynamic strategy for their (unknown but well-defined) payoff. Classic learning algorithms (such as no-regret algorithms) provide some counterfactual guarantees for the learner, but might perform much more poorly than other learning algorithms against particular optimizer payoffs. In this paper, we introduce the notion of asymptotically Pareto-optimal learning algorithms. Intuitively, if a learning algorithm is Pareto-optimal, then there is no other algorithm which performs asymptotically at least as well against all optimizers and performs strictly better (by at least ) against some optimizer. We show that well-known no-regret algorithms such as Multiplicative Weights and Follow The Regularized Leader are Pareto-dominated. However, while no-regret is not enough to ensure Pareto-optimality, we show that a strictly stronger property, no-swap-regret, is a sufficient condition for Pareto-optimality. Proving these results requires us to address various technical challenges specific to repeated play, including the fact that there is no simple characterization of how optimizers who are rational in the long-term best-respond against a learning algorithm over multiple rounds of play. To address this, we introduce the idea of the asymptotic menu of a learning algorithm: the convex closure of all correlated distributions over strategy profiles that are asymptotically implementable by an adversary. We show that all no-swap-regret algorithms share the same asymptotic menu, implying that all no-swap-regret algorithms are ``strategically equivalent''.
Paper Structure (43 sections, 60 theorems, 15 equations, 6 figures)

This paper contains 43 sections, 60 theorems, 15 equations, 6 figures.

Key Result

Lemma 3.1

For any learning algorithm $\mathcal{A}$, $V_{L}(\mathcal{M}(\mathcal{A}), u_O) = V_{L}(\mathcal{A}, u_O)$.

Figures (6)

  • Figure 2: A simple trajectory with five segments.
  • Figure 3: The trajectory ending at $X_4$ is drawn in red with the last segment in blue -- it can be seen as a convex combination of the trajectory upto $X_3$ (all in red) and the latter's extension marked by the purple segment.
  • Figure 4: An offset trajectory (in red) that is also a spiral, with $X_4$ and $X_0$ on the same ray (along $r_{AB}$ and marked in purple)
  • Figure 5: The state space with hard and soft boundaries, in black and purple respectively.
  • Figure 6: A trajectory is shown with different types of segments having different colors. In order, it contains: an I segment (in green), a C2 segment (in blue), an NC1 segment (in red), a C1 segment (in blue), and a final F segment (in green).
  • ...and 1 more figures

Theorems & Definitions (73)

  • Definition 2.1: Asymptotic Pareto-dominance for learning algorithms
  • Definition 2.2
  • Definition 2.3
  • Lemma 3.1
  • Definition 3.2: Pareto-dominance for asymptotic menus
  • Theorem 3.3
  • Lemma 3.4
  • Lemma 3.5
  • Theorem 3.6
  • Lemma 3.7
  • ...and 63 more