Table of Contents
Fetching ...

Trial-and-Error Learning in Decentralized Matching Markets

Vade Shah, Bryce L. Ferguson, Jason R. Marden

TL;DR

The paper tackles stability in decentralized two-sided matching markets where agents lack knowledge of their own preferences and no central matcher exists. It proposes completely uncoupled trial-and-error learning policies and proves that they can converge to a stable matching with high probability, even without coordination. Furthermore, it shows that if one side uses a more sophisticated policy while the other uses PTL, the system can converge to the acceptor-optimal stable matching, illustrating a form of exploitability when agents model others' learning rules. Using the framework of regular perturbed Markov processes, the authors characterize the stochastically stable outcomes and provide constructive policy designs that guarantee convergence to stable configurations.

Abstract

Two-sided matching markets, environments in which two disjoint groups of agents seek to partner with one another, arise in several contexts. In static, centralized markets where agents know their preferences, standard algorithms can yield a stable matching. However, in dynamic, decentralized markets where agents must learn their preferences through interaction, such algorithms cannot be used. Our goal in this paper is to identify achievable stability guarantees in decentralized matching markets where (i) agents have limited information about their preferences and (ii) no central entity determines the match. Surprisingly, our first result demonstrates that these constraints do not preclude stability--simple "trial and error" learning policies guarantee convergence to a stable matching without requiring coordination between agents. Our second result shows that more sophisticated policies can direct the system toward a particular group's optimal stable matching. This finding highlights an important dimension of strategic learning: when agents can accurately model others' policies, they can adapt their own behavior to systematically influence outcomes in their favor--a phenomenon with broad implications for learning in multi-agent systems.

Trial-and-Error Learning in Decentralized Matching Markets

TL;DR

The paper tackles stability in decentralized two-sided matching markets where agents lack knowledge of their own preferences and no central matcher exists. It proposes completely uncoupled trial-and-error learning policies and proves that they can converge to a stable matching with high probability, even without coordination. Furthermore, it shows that if one side uses a more sophisticated policy while the other uses PTL, the system can converge to the acceptor-optimal stable matching, illustrating a form of exploitability when agents model others' learning rules. Using the framework of regular perturbed Markov processes, the authors characterize the stochastically stable outcomes and provide constructive policy designs that guarantee convergence to stable configurations.

Abstract

Two-sided matching markets, environments in which two disjoint groups of agents seek to partner with one another, arise in several contexts. In static, centralized markets where agents know their preferences, standard algorithms can yield a stable matching. However, in dynamic, decentralized markets where agents must learn their preferences through interaction, such algorithms cannot be used. Our goal in this paper is to identify achievable stability guarantees in decentralized matching markets where (i) agents have limited information about their preferences and (ii) no central entity determines the match. Surprisingly, our first result demonstrates that these constraints do not preclude stability--simple "trial and error" learning policies guarantee convergence to a stable matching without requiring coordination between agents. Our second result shows that more sophisticated policies can direct the system toward a particular group's optimal stable matching. This finding highlights an important dimension of strategic learning: when agents can accurately model others' policies, they can adapt their own behavior to systematically influence outcomes in their favor--a phenomenon with broad implications for learning in multi-agent systems.

Paper Structure

This paper contains 10 sections, 3 theorems, 1 equation, 5 figures, 5 algorithms.

Key Result

Theorem 1

Consider a matching market $(\mathbf{P}, \mathbf{A}, \mathbf{U})$. For any $\delta \in (0, 1]$, there exist completely uncoupled policies $\pi_\mathbf{P}$ and $\pi_\mathbf{A}$ such that $\mathbb{P}[\mu^t \in \mathbf{M}^{\rm st}] > 1 - \delta$ for all sufficiently large $t$.

Figures (5)

  • Figure 1: A two-sided matching market with 3 proposers and 3 acceptors. In a static, centralized market with complete information (left), an algorithm gathers agents' known preferences and assigns a matching. In a dynamic, decentralized environment with two-sided uncertainty (right), agents learn their own preferences over one another as they interact and form matchings.
  • Figure 2: The iterative feedback process through which agents select actions. For simplicity, the diagram explicitly depicts the behavior only of $P_i$ and $A_j$, but the process depends on the behavior of all agents. The dashed line indicates that $A_j$ observes $a_{P_i}^t$ only if $a_{P_i}^t = A_j$.
  • Figure : Proposer action selection rule
  • Figure : Acceptor action selection rule
  • Figure : Acceptor-optimal state update rule

Theorems & Definitions (17)

  • Theorem 1
  • Theorem 2
  • Theorem : Young, 1993 young1993evolution
  • Claim 1.1
  • proof
  • Claim 1.2
  • proof
  • Claim 1.3
  • proof
  • Claim 1.4
  • ...and 7 more