Table of Contents
Fetching ...

Decision Making under Imperfect Recall: Algorithms and Benchmarks

Emanuel Tewolde, Brian Hu Zhang, Ioannis Anagnostides, Tuomas Sandholm, Vincent Conitzer

TL;DR

This paper introduces the first benchmark suite for imperfect-recall decision problems and introduces the family of regret matching (RM) algorithms for nonlinear constrained optimization, establishing the RM family as a formidable approach to large-scale constrained optimization problems.

Abstract

In game theory, imperfect-recall decision problems model situations in which an agent forgets information it held before. They encompass games such as the ``absentminded driver'' and team games with limited communication. In this paper, we introduce the first benchmark suite for imperfect-recall decision problems. Our benchmarks capture a variety of problem types, including ones concerning privacy in AI systems that elicit sensitive information, and AI safety via testing of agents in simulation. Across 61 problem instances generated using this suite, we evaluate the performance of different algorithms for finding first-order optimal strategies in such problems. In particular, we introduce the family of regret matching (RM) algorithms for nonlinear constrained optimization. This class of parameter-free algorithms has enjoyed tremendous success in solving large two-player zero-sum games, but, surprisingly, they were hitherto relatively unexplored beyond that setting. Our key finding is that RM algorithms consistently outperform commonly employed first-order optimizers such as projected gradient descent, often by orders of magnitude. This establishes, for the first time, the RM family as a formidable approach to large-scale constrained optimization problems.

Decision Making under Imperfect Recall: Algorithms and Benchmarks

TL;DR

This paper introduces the first benchmark suite for imperfect-recall decision problems and introduces the family of regret matching (RM) algorithms for nonlinear constrained optimization, establishing the RM family as a formidable approach to large-scale constrained optimization problems.

Abstract

In game theory, imperfect-recall decision problems model situations in which an agent forgets information it held before. They encompass games such as the ``absentminded driver'' and team games with limited communication. In this paper, we introduce the first benchmark suite for imperfect-recall decision problems. Our benchmarks capture a variety of problem types, including ones concerning privacy in AI systems that elicit sensitive information, and AI safety via testing of agents in simulation. Across 61 problem instances generated using this suite, we evaluate the performance of different algorithms for finding first-order optimal strategies in such problems. In particular, we introduce the family of regret matching (RM) algorithms for nonlinear constrained optimization. This class of parameter-free algorithms has enjoyed tremendous success in solving large two-player zero-sum games, but, surprisingly, they were hitherto relatively unexplored beyond that setting. Our key finding is that RM algorithms consistently outperform commonly employed first-order optimizers such as projected gradient descent, often by orders of magnitude. This establishes, for the first time, the RM family as a formidable approach to large-scale constrained optimization problems.
Paper Structure (34 sections, 3 theorems, 3 equations, 70 figures, 2 tables, 5 algorithms)

This paper contains 34 sections, 3 theorems, 3 equations, 70 figures, 2 tables, 5 algorithms.

Key Result

Theorem 1

1. Maximizing the utility in a decision making problem under imperfect recall is captured by maximizing $U$---a polynomial function in $\bm{x}$---over the product of simplices $\mathcal{X}$. 2. Any constrained maximization problem $\max_{\bm{x} \in \mathcal{X}} p(\bm{x})$ of a polynomial function $p

Figures (70)

  • Figure 1: Three tree-form decision problems, discussed after \ref{['defn:decision problem']}. The top and bottom ones are of imperfect recall. The right one further exhibits absentmindedness.
  • Figure 2: Top: A simple simulation problem. The misaligned agent receives $10$ utility for its preferred action (which is the bad action for the simulator), and $1$ utility for the other. The simulator decides to simulate with $4/5$ probability, and up to at most $2$ times, in order to catch misaligned behavior in advance. Bottom: A slightly more complex simulation problem with two testing scenarios. The "deployment" phase / subgame $\Gamma'$ is visualized in the appendix.
  • Figure 3: Subgroup detection under privacy constraints. On the left, we see an arbitrary graph with two subgroups (a $3$-clique, and a star of degree $3$). The goal is to find as many of the subgroups' nodes as possible. On the right, we see another such decision problem on a 2D grid, which we visualized as an instance of the Absentminded Battleship game. The agent has already succeeded in hitting one node of each ship, which indicates that there must be more subgroup nodes nearby. The agent does not remember whether it has selected any cell other than these two before.
  • Figure 4: Benchmark problem instances with $\sim$800 infosets, $\sim$300 infosets, 3 infosets, and $\sim$100 infosets respectively.
  • Figure 5: This is the decision making instance one would obtain from applying the construction of the proof of \ref{['thm:equiv to poly opt']} to the polynomial maximization $\max 2x^2y - 3xyz$ s.t. $0 \leq x,y,z \leq 1$.
  • ...and 65 more figures

Theorems & Definitions (5)

  • Definition 1
  • Theorem 1: Gimbert20:BridgeTewolde23:Computational
  • Proposition 1: KollerM92; Tewolde23:Computational
  • Definition 2
  • Theorem 2: Anagnostides26:Convergence