Policy Space Response Oracles: A Survey

Ariyan Bighashdel; Yongzhao Wang; Stephen McAleer; Rahul Savani; Frans A. Oliehoek

Policy Space Response Oracles: A Survey

Ariyan Bighashdel, Yongzhao Wang, Stephen McAleer, Rahul Savani, Frans A. Oliehoek

TL;DR

Policy Space Response Oracles (PSRO) modernize game-theoretic analysis by maintaining restricted strategy spaces and iteratively expanding them with learning-based best responses, guided by meta-strategy solvers (MSS) and various response oracles (RO). By unifying ideas from DO, EGTA, and population-based RL, PSRO supports diverse game forms (normal-form, extensive-form, mean-field) and solution concepts beyond Nash equilibrium, while addressing issues like overfitting and exploration through mechanisms such as MRCP, CE/CCE, rectified Nash, and automated MSS design. The survey catalogues MSS/RO variants, their theoretical properties, evaluation methodologies, and efficiency improvements (parallelization, transfer learning, and sub-sample modeling), and it highlights broad applications and available implementations (OpenSpiel, MALib). Open questions focus on scaling to many players, achieving fully parallel architectures, equilibrium refinements, multiple equilibria, integration with CFR/subgame solving, automatic hyperparameter tuning, and the incorporation of large language models. The work emphasizes practical impact by outlining concrete directions for scalable, robust multiagent learning in complex domains and providing a structured roadmap for future PSRO research and deployments.

Abstract

Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO), which holds promise to improve scalability by focusing attention on sufficient subsets of strategies. We first motivate PSRO and provide historical context. We then focus on the strategy exploration problem for PSRO: the challenge of assembling effective subsets of strategies that still represent the original game well with minimum computational cost. We survey current research directions for enhancing the efficiency of PSRO, and explore the applications of PSRO across various domains. We conclude by discussing open questions and future research.

Policy Space Response Oracles: A Survey

TL;DR

Abstract

Policy Space Response Oracles: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (1)