Automated Design of Affine Maximizer Mechanisms in Dynamic Settings

Michael Curry; Vinzenz Thoma; Darshan Chakrabarti; Stephen McAleer; Christian Kroer; Tuomas Sandholm; Niao He; Sven Seuken

Automated Design of Affine Maximizer Mechanisms in Dynamic Settings

Michael Curry, Vinzenz Thoma, Darshan Chakrabarti, Stephen McAleer, Christian Kroer, Tuomas Sandholm, Niao He, Sven Seuken

TL;DR

The paper introduces a general framework for automated dynamic mechanism design by extending affine maximizer mechanisms to Markov decision processes. It casts the search for high-performing mechanisms as a bilevel optimization: the outer level selects AMA parameters $(w,b)$ and the inner level solves an MDP to maximize the affine social welfare defined by those parameters. The authors establish differentiability of the expected revenue under this setup and develop three optimization strategies—grid search, zeroth-order gradient estimation, and differentiation through a regularized linear program—to learn effective AMA parameters. Empirical results across sequential sales, dynamic task scheduling, and grid navigation show that the resulting mechanisms are truthful, IR, and consistently outperform dynamic VCG baselines in revenue and makespan, with gradient-based methods offering superior efficiency and scalability. The approach is broadly applicable to dynamic problems with general valuation spaces and arbitrary loss objectives, and it lays the groundwork for scaling to more complex domains and integrating with deep RL.

Abstract

Dynamic mechanism design is a challenging extension to ordinary mechanism design in which the mechanism designer must make a sequence of decisions over time in the face of possibly untruthful reports of participating agents. Optimizing dynamic mechanisms for welfare is relatively well understood. However, there has been less work on optimizing for other goals (e.g. revenue), and without restrictive assumptions on valuations, it is remarkably challenging to characterize good mechanisms. Instead, we turn to automated mechanism design to find mechanisms with good performance in specific problem instances. In fact, the situation is similar even in static mechanism design. However, in the static case, optimization/machine learning-based automated mechanism design techniques have been successful in finding high-revenue mechanisms in cases beyond the reach of analytical results. We extend the class of affine maximizer mechanisms to MDPs where agents may untruthfully report their rewards. This extension results in a challenging bilevel optimization problem in which the upper problem involves choosing optimal mechanism parameters, and the lower problem involves solving the resulting MDP. Our approach can find truthful dynamic mechanisms that achieve strong performance on goals other than welfare, and can be applied to essentially any problem setting-without restrictions on valuations-for which RL can learn optimal policies.

Automated Design of Affine Maximizer Mechanisms in Dynamic Settings

TL;DR

and the inner level solves an MDP to maximize the affine social welfare defined by those parameters. The authors establish differentiability of the expected revenue under this setup and develop three optimization strategies—grid search, zeroth-order gradient estimation, and differentiation through a regularized linear program—to learn effective AMA parameters. Empirical results across sequential sales, dynamic task scheduling, and grid navigation show that the resulting mechanisms are truthful, IR, and consistently outperform dynamic VCG baselines in revenue and makespan, with gradient-based methods offering superior efficiency and scalability. The approach is broadly applicable to dynamic problems with general valuation spaces and arbitrary loss objectives, and it lays the groundwork for scaling to more complex domains and integrating with deep RL.

Abstract

Paper Structure (39 sections, 5 theorems, 36 equations, 5 tables, 1 algorithm)

This paper contains 39 sections, 5 theorems, 36 equations, 5 tables, 1 algorithm.

Introduction
Our Contributions
Related Work
Maximizing Welfare in Dynamic Mechanisms
Dynamic Mechanism Design for Goals Other Than Welfare
Preference Elicitation from Multiple Agents and Multistage Mechanisms
Static Automated Mechanism Design
Preliminaries
Formal Model of Problem Setting
Environment and policy/allocation rule
Mechanism Design Desiderata
Incentive compatibility and payments
Individual rationality
Background on Affine Maximizers
Dynamic Mechanism Design as Bilevel Optimisation
...and 24 more sections

Key Result

Theorem 4.1

Let $\mathcal{L}$ be a loss function for the problem in Equation eq:bilevel and assume it can be decomposed as follows where it holds for all $k$ that $g_k = \mathcal{O}(\lVert\boldsymbol{r}\rVert_{\infty})$. Assume further that the support of $\boldsymbol{r}$ is compact or $f$ decays sufficicently quickly such that $\mathbb{E}[\lVert\boldsymbol{r}\rVert_{\infty}]$ exists. Then $E_{\boldsymbol{r}

Theorems & Definitions (11)

Remark
Definition 3.1: Affine maximizers
Theorem 4.1
proof : Proof Idea
Lemma 4.2
proof
Theorem 4.3: Pointwise convergence of regularised loss
Corollary 4.4
Theorem A.1
proof
...and 1 more

Automated Design of Affine Maximizer Mechanisms in Dynamic Settings

TL;DR

Abstract

Automated Design of Affine Maximizer Mechanisms in Dynamic Settings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (11)