Learning to Recommend in Unknown Games

Arwa Alanqary; Zakaria Baba; Manxi Wu; Alexandre M. Bayen

Learning to Recommend in Unknown Games

Arwa Alanqary, Zakaria Baba, Manxi Wu, Alexandre M. Bayen

TL;DR

It is shown that under quantal-response feedback the game is learnable, up to a positive affine equivalence class, with logarithmic sample complexity in the desired precision, whereas best-response feedback can only identify a larger set of agents' utilities.

Abstract

We study preference learning through recommendations in multi-agent game settings, where a moderator repeatedly interacts with agents whose utility functions are unknown. In each round, the moderator issues action recommendations and observes whether agents follow or deviate from them. We consider two canonical behavioral feedback models-best response and quantal response-and study how the information revealed by each model affects the learnability of agents' utilities. We show that under quantal-response feedback the game is learnable, up to a positive affine equivalence class, with logarithmic sample complexity in the desired precision, whereas best-response feedback can only identify a larger set of agents' utilities. We give a complete geometric characterization of this set. Moreover, we introduce a regret notion based on agents' incentives to deviate from recommendations and design an online algorithm with low regret under both feedback models, with bounds scaling linearly in the game dimension and logarithmically in time. Our results lay a theoretical foundation for AI recommendation systems in strategic multi-agent environments, where recommendation compliances are shaped by strategic interaction.

Learning to Recommend in Unknown Games

TL;DR

Abstract

Paper Structure (43 sections, 24 theorems, 95 equations, 1 figure, 5 algorithms)

This paper contains 43 sections, 24 theorems, 95 equations, 1 figure, 5 algorithms.

Introduction
Our contributions
Related work
Inverse game theory
Learning optimal Stackelberg strategies
Contextual search and inverse optimization
Model and preliminaries
Recommendations in unknown games.
Choice models.
Regret.
Learnability
Indistinguishability.
Learnability.
Learnability from the quantal-response feedback
Step 1: Identification of utility differences up to scale.
...and 28 more sections

Key Result

Theorem 1

The game utility functions are learnable under the QR model, but not under the BR model. Moreover, we characterize the set of all transformations of a game that keeps it indistinguishable under the BR model using polyhedral duality.

Figures (1)

Figure 1: The utility polytope constructed as the convex hull of the utility vectors of the games $G(\{A_i\}_{i=1}^n,\{u_i\}_{i=1}^n)$ (left) and $G(\{A_i\}_{i=1}^n,\{v_i\}_{i=1}^n)$ (right) and the best response region for each action.

Theorems & Definitions (51)

Theorem 1: Informal learnability results
Theorem 2: Informal learning complexity result
Theorem 3: Informal regret bound
Definition 1
Definition 2: Best-response sets
Definition 3: Quantal-response sets
Definition 4: Game equivalence
Remark 1
Definition 5: Indistinguishability under a feedback model
Definition 6: Learnability from a feedback model
...and 41 more

Learning to Recommend in Unknown Games

TL;DR

Abstract

Learning to Recommend in Unknown Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (51)