Trial-and-Error Learning in Decentralized Matching Markets
Vade Shah, Bryce L. Ferguson, Jason R. Marden
TL;DR
The paper tackles stability in decentralized two-sided matching markets where agents lack knowledge of their own preferences and no central matcher exists. It proposes completely uncoupled trial-and-error learning policies and proves that they can converge to a stable matching with high probability, even without coordination. Furthermore, it shows that if one side uses a more sophisticated policy while the other uses PTL, the system can converge to the acceptor-optimal stable matching, illustrating a form of exploitability when agents model others' learning rules. Using the framework of regular perturbed Markov processes, the authors characterize the stochastically stable outcomes and provide constructive policy designs that guarantee convergence to stable configurations.
Abstract
Two-sided matching markets, environments in which two disjoint groups of agents seek to partner with one another, arise in several contexts. In static, centralized markets where agents know their preferences, standard algorithms can yield a stable matching. However, in dynamic, decentralized markets where agents must learn their preferences through interaction, such algorithms cannot be used. Our goal in this paper is to identify achievable stability guarantees in decentralized matching markets where (i) agents have limited information about their preferences and (ii) no central entity determines the match. Surprisingly, our first result demonstrates that these constraints do not preclude stability--simple "trial and error" learning policies guarantee convergence to a stable matching without requiring coordination between agents. Our second result shows that more sophisticated policies can direct the system toward a particular group's optimal stable matching. This finding highlights an important dimension of strategic learning: when agents can accurately model others' policies, they can adapt their own behavior to systematically influence outcomes in their favor--a phenomenon with broad implications for learning in multi-agent systems.
