Availability is all you need: achieving optimal regret with minimal information for dynamic matching
Süleyman Kerimov, Pengyu Qian, Mingwei Yang, Sophie H. Yu
TL;DR
This work analyzes availability-based policies for centralized dynamic two-way matching under the General Position Gap. It shows that global availability-based PM achieves the optimal all-time regret scaling $O(\epsilon^{-1})$ on general networks, while in acyclic networks, local policies TP and the new TTP attain the same scaling, with TTP proven optimal. The analysis develops novel multi-step drift techniques and geometric Lyapunov functions, including fractional-arrival extensions, to bound queue lengths and regret. The results demonstrate that minimal binary availability information can suffice for optimal performance, offering robust, low-information policies suitable for queueing and load-balancing contexts. The paper also identifies open questions about extending optimality of local availability-based policies to general networks and provides empirical evidence supporting the theoretical findings.
Abstract
We study a centralized discrete-time dynamic two-way matching model with finitely many agent types. Agents arrive stochastically over time and join their type-dedicated queues waiting to be matched. We focus on availability-based policies that make matching decisions based solely on agent availability across types (i.e., whether queues are empty or not), rather than relying on complete queue-length information (e.g., the longest-queue policy). We aim to achieve constant regret at all times with optimal scaling in terms of the general position gap, $ε$, which measures the distance of the fluid relaxation from degeneracy. We classify availability-based policies into global and local policies based on the scope of information they utilize. First, for general networks (possibly cyclic), we propose a global availability-based policy, probabilistic matching, and prove that it achieves the optimal all-time regret scaling of $O(ε^{-1})$, matching the known lower bound established by [KAG24]. Second, for acyclic networks, we focus on the class of local availability-based policies, specifically static priority policies that prioritize matches based on a fixed order. Within this class, we derive the first explicit regret bound for the previously proposed tree priority policy, showing all-time regret scaling of $O(ε^{-(d+1)/2})$, where $d$ is the network depth. Next, we introduce a new truncated tree priority policy and prove that it is the first static priority policy to achieve the optimal all-time regret scaling of $O(ε^{-1})$. These policies are appealing for matching systems such as queueing and load balancing; they reduce operational costs by using minimal information while effectively balancing the trade-off between immediate and future rewards.
