Table of Contents
Fetching ...

Data-Driven Mechanism Design using Multi-Agent Revealed Preferences

Luke Snow, Vikram Krishnamurthy

TL;DR

This work introduces a data-driven mechanism-design framework for a sequence of independent one-shot games where the designer observes only equilibrium actions and has no access to agent utilities. It casts mechanism design as a revealed-preference problem, deriving necessary and sufficient linear-inequality conditions (MM-GARP) that certify when observed mixed-strategy equilibria can be socially optimal, and defines a Pareto gap loss L(φ) that an SPSA-based Algorithm 1 minimizes to either achieve social optimality or certify its impossibility. The framework connects to robust revealed-preference metrics (CCEI, GARP_F) and extends to finite-sample settings through a distributionally robust RL formulation, reformulated as a semi-infinite program solvable via exchange methods. Two numerical experiments—wireless spectrum sharing and river-pollution games—demonstrate rapid convergence of the Pareto gap and tangible welfare gains using only equilibrium data. Overall, the paper provides a rigorous, data-driven pathway to design mechanisms that steer decentralized, utility-unknown agents toward socially desirable outcomes, with principled guarantees and finite-sample robustness.

Abstract

We study a sequence of independent one-shot non-cooperative games where agents play equilibria determined by a tunable mechanism. Observing only equilibrium decisions, without parametric or distributional knowledge of utilities, we aim to steer equilibria towards social optimality, and to certify when this is impossible due to the game's structure. We develop an adaptive RL framework for this mechanism design objective. First, we derive a multi-agent revealed-preference test for Pareto optimality that gives necessary and sufficient conditions for the existence of utilities under which the empirically observed mixed-strategy Nash equilibria are socially optimal. The conditions form a tractable linear program. Using this, we build an IRL step that computes the Pareto gap, the distance of observed strategies from Pareto optimality, and couple it with a policy-gradient update. We prove convergence to a mechanism that globally minimizes the Pareto gap. This yields a principled achievability test: if social optimality is attainable for the given game and observed equilibria, Algorithm 1 attains it; otherwise, the algorithm certifies unachievability while converging to the mechanism closest to social optimality. We also show a tight link between our loss and robust revealed-preference metrics, allowing algorithmic suboptimality to be interpreted through established microeconomic notions. Finally, when only finitely many i.i.d. samples from mixed strategies (partial strategy specifications) are available, we derive concentration bounds for convergence and design a distributionally robust RL procedure that attains the mechanism-design objective for the fully specified strategies.

Data-Driven Mechanism Design using Multi-Agent Revealed Preferences

TL;DR

This work introduces a data-driven mechanism-design framework for a sequence of independent one-shot games where the designer observes only equilibrium actions and has no access to agent utilities. It casts mechanism design as a revealed-preference problem, deriving necessary and sufficient linear-inequality conditions (MM-GARP) that certify when observed mixed-strategy equilibria can be socially optimal, and defines a Pareto gap loss L(φ) that an SPSA-based Algorithm 1 minimizes to either achieve social optimality or certify its impossibility. The framework connects to robust revealed-preference metrics (CCEI, GARP_F) and extends to finite-sample settings through a distributionally robust RL formulation, reformulated as a semi-infinite program solvable via exchange methods. Two numerical experiments—wireless spectrum sharing and river-pollution games—demonstrate rapid convergence of the Pareto gap and tangible welfare gains using only equilibrium data. Overall, the paper provides a rigorous, data-driven pathway to design mechanisms that steer decentralized, utility-unknown agents toward socially desirable outcomes, with principled guarantees and finite-sample robustness.

Abstract

We study a sequence of independent one-shot non-cooperative games where agents play equilibria determined by a tunable mechanism. Observing only equilibrium decisions, without parametric or distributional knowledge of utilities, we aim to steer equilibria towards social optimality, and to certify when this is impossible due to the game's structure. We develop an adaptive RL framework for this mechanism design objective. First, we derive a multi-agent revealed-preference test for Pareto optimality that gives necessary and sufficient conditions for the existence of utilities under which the empirically observed mixed-strategy Nash equilibria are socially optimal. The conditions form a tractable linear program. Using this, we build an IRL step that computes the Pareto gap, the distance of observed strategies from Pareto optimality, and couple it with a policy-gradient update. We prove convergence to a mechanism that globally minimizes the Pareto gap. This yields a principled achievability test: if social optimality is attainable for the given game and observed equilibria, Algorithm 1 attains it; otherwise, the algorithm certifies unachievability while converging to the mechanism closest to social optimality. We also show a tight link between our loss and robust revealed-preference metrics, allowing algorithmic suboptimality to be interpreted through established microeconomic notions. Finally, when only finitely many i.i.d. samples from mixed strategies (partial strategy specifications) are available, we derive concentration bounds for convergence and design a distributionally robust RL procedure that attains the mechanism-design objective for the fully specified strategies.
Paper Structure (75 sections, 14 theorems, 87 equations, 8 figures, 2 algorithms)

This paper contains 75 sections, 14 theorems, 87 equations, 8 figures, 2 algorithms.

Key Result

Theorem 1

Consider a dataset $\mathcal{D} = \{\mathcal{A}_{\phi,t},\mu_t, t \in [T]\}$eq:dataset comprising constraints and mixed strategies. Under Assumption as:confun, the following are equivalent:

Figures (8)

  • Figure 1: The reinforcement learning (RL) framework for achieving mechanism design comprises two steps: mechanism evaluation and mechanism adjustment. The mechanism evaluation step utilizes revealed preference theory as a form of inverse reinforcement learning (IRL), to determine the Pareto gap $L(\mu(\phi_k))$, defined in Section \ref{['sec:parreg']}. This measures the proximity of observed mixed strategy equilibria $\mu(\phi_k) \in \mathop{\mathrm{\mathcal{N}}}\nolimits(\phi_k)$ to social optimality $\mathop{\mathrm{\mathcal{S}}}\nolimits(\phi_k)$. The mechanism adjustment phase utilizes this IRL metric to perform policy optimization, by updating the mechanism according to a simultaneous perturbation stochastic approximation (SPSA) algorithm. Thus, this procedure operates without analytical or observational knowledge of the agent utility functions. The novelty lies in our IRL approach to policy evaluation, which generalizes results in the theory of microeconomic revealed preferences.
  • Figure 2: Pure-strategy mechanism flowchart. The mechanism modulates the mapping from joint-action to outcome space. The agent utilities are real-valued functions of outcomes. Thus, the mechanism equivalently modulates the utility functions' dependence on joint-actions.
  • Figure 3: Mixed-strategy mechanism flowchart. Mechanism parameter $\phi$ modulates the mapping from joint-action to outcome space. Thus, $\phi$ equivalently modulates the utility functions' dependence on joint-actions. In the mixed-strategy regime each agent specifies a distribution $\mu^i$ over actions $a^i$, so that joint-actions $\textbf{a}$ are taken randomly from product measure $\mu$. The diagram splits two perspectives on this process: the top flowchart corresponds to the pure-strategy utility gained for a particular joint-action $\textbf{a}$ taken randomly from $\mu$. However in the mixed-strategy regime agents aim to maximize their expected utility over the product measure $\mu$, as represented by the bottom arrow.
  • Figure 4: Interaction procedure. At each time $t$, the designer can provide mechanism $o_{\phi}$ and constraint functions $\{g_{\phi,t}^i(\cdot)\}_{i=1}^M$, and observes mixed-strategy Nash equilibrium response $\mu_t(\phi)$.
  • Figure 5: Implementation of Algorithm \ref{['alg:dro']}. Within Algorithm \ref{['alg:amd']}, instead of solving the minimization in lines 18-20, we solve the distributionally robust optimization \ref{['eq:dro']} by implementing Algorithm \ref{['alg:dro']}.
  • ...and 3 more figures

Theorems & Definitions (34)

  • Definition 1: Mixed-Strategy Nash Equilibrium
  • Definition 2: Mixed-Strategy Social Optimality
  • Definition 3: Mechanism Design
  • Definition 4: Constraint Functions
  • Definition 5: Consistency with Social Optimality
  • Definition 6: MM-GARP
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • ...and 24 more