Table of Contents
Fetching ...

Fast and Efficient Matching Algorithm with Deadline Instances

Zhao Song, Weixin Wang, Chenbo Yin, Junze Yin

TL;DR

This paper studies the online weighted bipartite matching problem with deadlines and uses a sketching matrix to approximate edge weights, reducing edge-weight computation from $O(nd)$ to $ ilde{O}( rac{1}{ ext{eps}^2}(n+d))$. It introduces FastGreedy and FastPostponedGreedy, achieving competitive ratios $(1- ext{eps})/2$ and $(1- ext{eps})/4$, with space $O(nd + ext{eps}^{-2}(n+d) ext{log}(n/ ext{delta}))$ and per-operation time $O( ext{eps}^{-2}(n+d) ext{log}(n/ ext{delta}))$, by applying a Johnson-Lindenstrauss sketch to distance-based edge weights. Empirical results on four real-world datasets show 10–20x speedups while maintaining total matching values close to the original algorithms, validating practical viability for large-scale, high-dimensional data. The framework enables efficient deadline-aware matching in large-scale systems and suggests extending sketching techniques to other variants of online matching and related optimization problems.

Abstract

The online weighted matching problem is a fundamental problem in machine learning due to its numerous applications. Despite many efforts in this area, existing algorithms are either too slow or don't take $\mathrm{deadline}$ (the longest time a node can be matched) into account. In this paper, we introduce a market model with $\mathrm{deadline}$ first. Next, we present our two optimized algorithms (\textsc{FastGreedy} and \textsc{FastPostponedGreedy}) and offer theoretical proof of the time complexity and correctness of our algorithms. In \textsc{FastGreedy} algorithm, we have already known if a node is a buyer or a seller. But in \textsc{FastPostponedGreedy} algorithm, the status of each node is unknown at first. Then, we generalize a sketching matrix to run the original and our algorithms on both real data sets and synthetic data sets. Let $ε\in (0,0.1)$ denote the relative error of the real weight of each edge. The competitive ratio of original \textsc{Greedy} and \textsc{PostponedGreedy} is $\frac{1}{2}$ and $\frac{1}{4}$ respectively. Based on these two original algorithms, we proposed \textsc{FastGreedy} and \textsc{FastPostponedGreedy} algorithms and the competitive ratio of them is $\frac{1 - ε}{2}$ and $\frac{1 - ε}{4}$ respectively. At the same time, our algorithms run faster than the original two algorithms. Given $n$ nodes in $\mathbb{R} ^ d$, we decrease the time complexity from $O(nd)$ to $\widetilde{O}(ε^{-2} \cdot (n + d))$, where for any function $f$, we use $\widetilde{O}(f)$ to denote $f \cdot \mathrm{poly}(\log f)$.

Fast and Efficient Matching Algorithm with Deadline Instances

TL;DR

This paper studies the online weighted bipartite matching problem with deadlines and uses a sketching matrix to approximate edge weights, reducing edge-weight computation from to . It introduces FastGreedy and FastPostponedGreedy, achieving competitive ratios and , with space and per-operation time , by applying a Johnson-Lindenstrauss sketch to distance-based edge weights. Empirical results on four real-world datasets show 10–20x speedups while maintaining total matching values close to the original algorithms, validating practical viability for large-scale, high-dimensional data. The framework enables efficient deadline-aware matching in large-scale systems and suggests extending sketching techniques to other variants of online matching and related optimization problems.

Abstract

The online weighted matching problem is a fundamental problem in machine learning due to its numerous applications. Despite many efforts in this area, existing algorithms are either too slow or don't take (the longest time a node can be matched) into account. In this paper, we introduce a market model with first. Next, we present our two optimized algorithms (\textsc{FastGreedy} and \textsc{FastPostponedGreedy}) and offer theoretical proof of the time complexity and correctness of our algorithms. In \textsc{FastGreedy} algorithm, we have already known if a node is a buyer or a seller. But in \textsc{FastPostponedGreedy} algorithm, the status of each node is unknown at first. Then, we generalize a sketching matrix to run the original and our algorithms on both real data sets and synthetic data sets. Let denote the relative error of the real weight of each edge. The competitive ratio of original \textsc{Greedy} and \textsc{PostponedGreedy} is and respectively. Based on these two original algorithms, we proposed \textsc{FastGreedy} and \textsc{FastPostponedGreedy} algorithms and the competitive ratio of them is and respectively. At the same time, our algorithms run faster than the original two algorithms. Given nodes in , we decrease the time complexity from to , where for any function , we use to denote .
Paper Structure (26 sections, 28 theorems, 9 equations, 23 figures, 2 tables, 5 algorithms)

This paper contains 26 sections, 28 theorems, 9 equations, 23 figures, 2 tables, 5 algorithms.

Key Result

Lemma 2.5

For any $X \subset \mathbb{R}^d$ of size $n$, there exists an embedding $f: \mathbb{R}^d \to \mathbb{R}^s$ where $s = O(\epsilon^{-2} \log n)$ such that $(1-\epsilon) \cdot \| x - y \|_2 \leq \| f(x) - f(y) \|_2 \leq (1+\epsilon)\cdot \| x - y \|_2$, where $x, y \in X$.

Figures (23)

  • Figure 1: The relationship between running time and parameter $s$ and $dl$ on GECRS data set. The parameters are defined as follows: $n$ is the node count, $d$ is the original node dimension, $s$ is the dimension after transformation, and $\mathrm{dl}$ is the maximum matching time per node (referred to as the deadline). Here GECRS denotes gene expression cancer RNA-Seq Data Set. PGreedy denotes Postponed Greedy.
  • Figure 2: The relationship between running time and parameter $s$ and $dl$ on Arcene data set. The parameters are defined as follows: $n$ is the node count, $d$ is the original node dimension, $s$ is the dimension after transformation, and $\mathrm{dl}$ is the maximum matching time per node (referred to as the deadline).
  • Figure 3: The relationship between running time and parameter $s$ and $dl$ on ARBT data set. The parameters are defined as follows: $n$ is the node count, $d$ is the original node dimension, $s$ is the dimension after transformation, and $\mathrm{dl}$ is the maximum matching time per node (referred to as the deadline). Let ARBT denote a study of Asian Religious and Biblical Texts Data Set.
  • Figure 4: The relationship between running time and parameter $s$ and $dl$ on REJAFADA data set. The parameters are defined as follows: $n$ is the node count, $d$ is the original node dimension, $s$ is the dimension after transformation, and $\mathrm{dl}$ is the maximum matching time per node (referred to as the deadline).
  • Figure 5: Comparison between the running time of each two algorithms.
  • ...and 18 more figures

Theorems & Definitions (47)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Lemma 2.5: JL Lemma, jl84
  • Lemma 3.1: Restatement of Lemma \ref{['lem:standard_greedy_correctness_formal']}
  • Lemma 3.2: Restatement of Lemma \ref{['lem:calculate_the_competive_ratio_formal']}
  • Lemma 3.3: Restatement of Lemma \ref{['lem:standard_postponed_greedy_correctness_formal']}
  • Theorem 3.4
  • proof
  • ...and 37 more