Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

Tejas Pagare; Avishek Ghosh

Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

Tejas Pagare, Avishek Ghosh

TL;DR

This work tackles learning in decentralized two-sided matching markets where both sides’ preferences are unknown. It introduces two algorithms, ETGS with blackboard communication and CA-ETC, which learn both sides’ preferences and converge to a stable matching without relying on restrictive market structure, with CA-ETC being fully decentralized. CA-ETC achieves a per-player regret of $\mathcal{O}\left(T_{\circ} \left(\frac{K \\log T}{T_{\circ} \\Delta^2}\right)^{1/\\gamma} + T_{\circ} \left(\frac{T}{T_{\circ}}\\right)^{\\gamma}\right)$ (for appropriate choices of $T_{\circ}$ and $\\gamma$), while a blackboard baseline attains logarithmic regret in $T$; simulations on a 5×5 market corroborate finite-time convergence to correct rankings. The results broaden the applicability of learning in matching markets by removing strong prior assumptions and demonstrating practical, communication-free mechanisms. This work paves the way for more robust two-sided learning, including asynchronous settings and markets with transferable utilities.

Abstract

Online learning in a decentralized two-sided matching markets, where the demand-side (players) compete to match with the supply-side (arms), has received substantial interest because it abstracts out the complex interactions in matching platforms (e.g. UpWork, TaskRabbit). However, past works assume that each arm knows their preference ranking over the players (one-sided learning), and each player aim to learn the preference over arms through successive interactions. Moreover, several (impractical) assumptions on the problem are usually made for theoretical tractability such as broadcast player-arm match Liu et al. (2020; 2021); Kong & Li (2023) or serial dictatorship Sankararaman et al. (2021); Basu et al. (2021); Ghosh et al. (2022). In this paper, we study a decentralized two-sided matching market, where we do not assume that the preference ranking over players are known to the arms apriori. Furthermore, we do not have any structural assumptions on the problem. We propose a multi-phase explore-then-commit type algorithm namely epoch-based CA-ETC (collision avoidance explore then commit) (\texttt{CA-ETC} in short) for this problem that does not require any communication across agents (players and arms) and hence decentralized. We show that for the initial epoch length of $T_{\circ}$ and subsequent epoch-lengths of $2^{l/γ} T_{\circ}$ (for the $l-$th epoch with $γ\in (0,1)$ as an input parameter to the algorithm), \texttt{CA-ETC} yields a player optimal expected regret of $\mathcal{O}\left(T_{\circ} (\frac{K \log T}{T_{\circ} Δ^2})^{1/γ} + T_{\circ} (\frac{T}{T_{\circ}})^γ\right)$ for the $i$-th player, where $T$ is the learning horizon, $K$ is the number of arms and $Δ$ is an appropriately defined problem gap. Furthermore, we propose a blackboard communication based baseline achieving logarithmic regret in $T$.

Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

TL;DR

(for appropriate choices of

and

), while a blackboard baseline attains logarithmic regret in

; simulations on a 5×5 market corroborate finite-time convergence to correct rankings. The results broaden the applicability of learning in matching markets by removing strong prior assumptions and demonstrating practical, communication-free mechanisms. This work paves the way for more robust two-sided learning, including asynchronous settings and markets with transferable utilities.

Abstract

and subsequent epoch-lengths of

(for the

th epoch with

as an input parameter to the algorithm), \texttt{CA-ETC} yields a player optimal expected regret of

for the

-th player, where

is the learning horizon,

is the number of arms and

is an appropriately defined problem gap. Furthermore, we propose a blackboard communication based baseline achieving logarithmic regret in

Paper Structure (12 sections, 2 theorems, 6 equations, 1 figure, 1 table, 4 algorithms)

This paper contains 12 sections, 2 theorems, 6 equations, 1 figure, 1 table, 4 algorithms.

Introduction
Summary of Contributions
Decentralized two-sided learning algorithm:
No structural assumptions on markets:
Problem Setting
Stable matching:
Regret:
Algorithms for Two-sided matching markets
Warmup: Learning with Blackboard
Epoch-based CA-ETC
Simulations
Discussion and Future Work

Key Result

Theorem 1

Suppose every player plays Algorithm algo:blackboard and every arm plays Algorithm algo:blackboardarm for $T$ iterations. Then the player-optimal regret of player $p_i$ satisfies A similar upper bound holds for arm-pessimal regret.

Figures (1)

Figure 1: 5x5 market cumulative regret plot

Theorems & Definitions (10)

Definition 1
Theorem 1: Regret of Algorithm \ref{['algo:blackboard']}
Remark 1: Different terms
Remark 2
Remark 3
Theorem 2
Remark 4
Remark 5: Type of CA-ETC
Remark 6: Different terms
Remark 7: Choice of $T_{\circ}$

Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

TL;DR

Abstract

Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (10)