Table of Contents
Fetching ...

Learning to Recommend in Unknown Games

Arwa Alanqary, Zakaria Baba, Manxi Wu, Alexandre M. Bayen

TL;DR

It is shown that under quantal-response feedback the game is learnable, up to a positive affine equivalence class, with logarithmic sample complexity in the desired precision, whereas best-response feedback can only identify a larger set of agents' utilities.

Abstract

We study preference learning through recommendations in multi-agent game settings, where a moderator repeatedly interacts with agents whose utility functions are unknown. In each round, the moderator issues action recommendations and observes whether agents follow or deviate from them. We consider two canonical behavioral feedback models-best response and quantal response-and study how the information revealed by each model affects the learnability of agents' utilities. We show that under quantal-response feedback the game is learnable, up to a positive affine equivalence class, with logarithmic sample complexity in the desired precision, whereas best-response feedback can only identify a larger set of agents' utilities. We give a complete geometric characterization of this set. Moreover, we introduce a regret notion based on agents' incentives to deviate from recommendations and design an online algorithm with low regret under both feedback models, with bounds scaling linearly in the game dimension and logarithmically in time. Our results lay a theoretical foundation for AI recommendation systems in strategic multi-agent environments, where recommendation compliances are shaped by strategic interaction.

Learning to Recommend in Unknown Games

TL;DR

It is shown that under quantal-response feedback the game is learnable, up to a positive affine equivalence class, with logarithmic sample complexity in the desired precision, whereas best-response feedback can only identify a larger set of agents' utilities.

Abstract

We study preference learning through recommendations in multi-agent game settings, where a moderator repeatedly interacts with agents whose utility functions are unknown. In each round, the moderator issues action recommendations and observes whether agents follow or deviate from them. We consider two canonical behavioral feedback models-best response and quantal response-and study how the information revealed by each model affects the learnability of agents' utilities. We show that under quantal-response feedback the game is learnable, up to a positive affine equivalence class, with logarithmic sample complexity in the desired precision, whereas best-response feedback can only identify a larger set of agents' utilities. We give a complete geometric characterization of this set. Moreover, we introduce a regret notion based on agents' incentives to deviate from recommendations and design an online algorithm with low regret under both feedback models, with bounds scaling linearly in the game dimension and logarithmically in time. Our results lay a theoretical foundation for AI recommendation systems in strategic multi-agent environments, where recommendation compliances are shaped by strategic interaction.
Paper Structure (43 sections, 24 theorems, 95 equations, 1 figure, 5 algorithms)

This paper contains 43 sections, 24 theorems, 95 equations, 1 figure, 5 algorithms.

Key Result

Theorem 1

The game utility functions are learnable under the QR model, but not under the BR model. Moreover, we characterize the set of all transformations of a game that keeps it indistinguishable under the BR model using polyhedral duality.

Figures (1)

  • Figure 1: The utility polytope constructed as the convex hull of the utility vectors of the games $G(\{A_i\}_{i=1}^n,\{u_i\}_{i=1}^n)$ (left) and $G(\{A_i\}_{i=1}^n,\{v_i\}_{i=1}^n)$ (right) and the best response region for each action.

Theorems & Definitions (51)

  • Theorem 1: Informal learnability results
  • Theorem 2: Informal learning complexity result
  • Theorem 3: Informal regret bound
  • Definition 1
  • Definition 2: Best-response sets
  • Definition 3: Quantal-response sets
  • Definition 4: Game equivalence
  • Remark 1
  • Definition 5: Indistinguishability under a feedback model
  • Definition 6: Learnability from a feedback model
  • ...and 41 more