Table of Contents
Fetching ...

Availability is all you need: achieving optimal regret with minimal information for dynamic matching

Süleyman Kerimov, Pengyu Qian, Mingwei Yang, Sophie H. Yu

TL;DR

This work analyzes availability-based policies for centralized dynamic two-way matching under the General Position Gap. It shows that global availability-based PM achieves the optimal all-time regret scaling $O(\epsilon^{-1})$ on general networks, while in acyclic networks, local policies TP and the new TTP attain the same scaling, with TTP proven optimal. The analysis develops novel multi-step drift techniques and geometric Lyapunov functions, including fractional-arrival extensions, to bound queue lengths and regret. The results demonstrate that minimal binary availability information can suffice for optimal performance, offering robust, low-information policies suitable for queueing and load-balancing contexts. The paper also identifies open questions about extending optimality of local availability-based policies to general networks and provides empirical evidence supporting the theoretical findings.

Abstract

We study a centralized discrete-time dynamic two-way matching model with finitely many agent types. Agents arrive stochastically over time and join their type-dedicated queues waiting to be matched. We focus on availability-based policies that make matching decisions based solely on agent availability across types (i.e., whether queues are empty or not), rather than relying on complete queue-length information (e.g., the longest-queue policy). We aim to achieve constant regret at all times with optimal scaling in terms of the general position gap, $ε$, which measures the distance of the fluid relaxation from degeneracy. We classify availability-based policies into global and local policies based on the scope of information they utilize. First, for general networks (possibly cyclic), we propose a global availability-based policy, probabilistic matching, and prove that it achieves the optimal all-time regret scaling of $O(ε^{-1})$, matching the known lower bound established by [KAG24]. Second, for acyclic networks, we focus on the class of local availability-based policies, specifically static priority policies that prioritize matches based on a fixed order. Within this class, we derive the first explicit regret bound for the previously proposed tree priority policy, showing all-time regret scaling of $O(ε^{-(d+1)/2})$, where $d$ is the network depth. Next, we introduce a new truncated tree priority policy and prove that it is the first static priority policy to achieve the optimal all-time regret scaling of $O(ε^{-1})$. These policies are appealing for matching systems such as queueing and load balancing; they reduce operational costs by using minimal information while effectively balancing the trade-off between immediate and future rewards.

Availability is all you need: achieving optimal regret with minimal information for dynamic matching

TL;DR

This work analyzes availability-based policies for centralized dynamic two-way matching under the General Position Gap. It shows that global availability-based PM achieves the optimal all-time regret scaling on general networks, while in acyclic networks, local policies TP and the new TTP attain the same scaling, with TTP proven optimal. The analysis develops novel multi-step drift techniques and geometric Lyapunov functions, including fractional-arrival extensions, to bound queue lengths and regret. The results demonstrate that minimal binary availability information can suffice for optimal performance, offering robust, low-information policies suitable for queueing and load-balancing contexts. The paper also identifies open questions about extending optimality of local availability-based policies to general networks and provides empirical evidence supporting the theoretical findings.

Abstract

We study a centralized discrete-time dynamic two-way matching model with finitely many agent types. Agents arrive stochastically over time and join their type-dedicated queues waiting to be matched. We focus on availability-based policies that make matching decisions based solely on agent availability across types (i.e., whether queues are empty or not), rather than relying on complete queue-length information (e.g., the longest-queue policy). We aim to achieve constant regret at all times with optimal scaling in terms of the general position gap, , which measures the distance of the fluid relaxation from degeneracy. We classify availability-based policies into global and local policies based on the scope of information they utilize. First, for general networks (possibly cyclic), we propose a global availability-based policy, probabilistic matching, and prove that it achieves the optimal all-time regret scaling of , matching the known lower bound established by [KAG24]. Second, for acyclic networks, we focus on the class of local availability-based policies, specifically static priority policies that prioritize matches based on a fixed order. Within this class, we derive the first explicit regret bound for the previously proposed tree priority policy, showing all-time regret scaling of , where is the network depth. Next, we introduce a new truncated tree priority policy and prove that it is the first static priority policy to achieve the optimal all-time regret scaling of . These policies are appealing for matching systems such as queueing and load balancing; they reduce operational costs by using minimal information while effectively balancing the trade-off between immediate and future rewards.

Paper Structure

This paper contains 54 sections, 38 theorems, 182 equations, 6 figures, 1 algorithm.

Key Result

Proposition 1

Suppose that ${\mathcal{G}}$ satisfies the GPG condition, and let $(z^*, s^*)$ be a non-degenerate optimal solution of $\mathrm{SPP}(\lambda)$ with $\epsilon$ defined as eq:gpg. Then, for every $\lambda' \in {\mathbb{R}}_{\geq 0}^n$ with $\left\|{\lambda - \lambda'} \right\|_1 \leq \epsilon$, $\math

Figures (6)

  • Figure 1: Information landscape and regret scaling of dynamic matching policies, with the results in this paper highlighted in blue. The policies are positioned according to the granularity (availability vs. queue lengths) and scope (local vs. global) of the state information they use, along with their all-time regret scaling in terms of the GPG parameter $\epsilon$, where $O_\epsilon(1)$ refers to a constant bound with an unknown dependence on $\epsilon$, and $d$ denote the depth of the acyclic graph.
  • Figure 2: A path network where $\mathcal{A}_+=\{4\}$ (the root is indicated with a yellow node).
  • Figure 3: Two example matching networks satisfying the GPG condition.
  • Figure 4: Regret comparison of the tree priority ($\mathbf{TP}$), the truncated tree priority ($\mathbf{TTP}$), the longest-queue ($\mathbf{LQ}$), and the randomized greedy ($\mathbf{PM}$) policies.
  • Figure 5: (LEFT) A cyclic two-way matching network that satisfies the GPG condition, where $z^*=\{0.085, 0.05, 0.32, 0.01, 0.08\}$ and $\epsilon=0.01$. (RIGHT) Regret comparison of the longest-queue ($\mathbf{LQ}$) and the probabilistic matching ($\mathbf{PM}$) policies
  • ...and 1 more figures

Theorems & Definitions (67)

  • Definition 1: General position gap
  • Proposition 1: Corollary 4.1 in kerimov2025optimality
  • Lemma 1
  • Theorem 1: Probabilistic matching
  • Definition 2: Static priority policy
  • Theorem 2: Tree priority
  • Theorem 3: Truncated tree priority
  • Example 1
  • Definition 3: Lyapunov function
  • Definition 4: Consistency
  • ...and 57 more