Fast and Efficient Matching Algorithm with Deadline Instances

Zhao Song; Weixin Wang; Chenbo Yin; Junze Yin

Fast and Efficient Matching Algorithm with Deadline Instances

Zhao Song, Weixin Wang, Chenbo Yin, Junze Yin

TL;DR

This paper studies the online weighted bipartite matching problem with deadlines and uses a sketching matrix to approximate edge weights, reducing edge-weight computation from $O(nd)$ to $ ilde{O}(rac{1}{ ext{eps}^2}(n+d))$. It introduces FastGreedy and FastPostponedGreedy, achieving competitive ratios $(1- ext{eps})/2$ and $(1- ext{eps})/4$, with space $O(nd + ext{eps}^{-2}(n+d) ext{log}(n/ ext{delta}))$ and per-operation time $O( ext{eps}^{-2}(n+d) ext{log}(n/ ext{delta}))$, by applying a Johnson-Lindenstrauss sketch to distance-based edge weights. Empirical results on four real-world datasets show 10–20x speedups while maintaining total matching values close to the original algorithms, validating practical viability for large-scale, high-dimensional data. The framework enables efficient deadline-aware matching in large-scale systems and suggests extending sketching techniques to other variants of online matching and related optimization problems.

Abstract

The online weighted matching problem is a fundamental problem in machine learning due to its numerous applications. Despite many efforts in this area, existing algorithms are either too slow or don't take $\mathrm{deadline}$ (the longest time a node can be matched) into account. In this paper, we introduce a market model with $\mathrm{deadline}$ first. Next, we present our two optimized algorithms (\textsc{FastGreedy} and \textsc{FastPostponedGreedy}) and offer theoretical proof of the time complexity and correctness of our algorithms. In \textsc{FastGreedy} algorithm, we have already known if a node is a buyer or a seller. But in \textsc{FastPostponedGreedy} algorithm, the status of each node is unknown at first. Then, we generalize a sketching matrix to run the original and our algorithms on both real data sets and synthetic data sets. Let $ε\in (0,0.1)$ denote the relative error of the real weight of each edge. The competitive ratio of original \textsc{Greedy} and \textsc{PostponedGreedy} is $\frac{1}{2}$ and $\frac{1}{4}$ respectively. Based on these two original algorithms, we proposed \textsc{FastGreedy} and \textsc{FastPostponedGreedy} algorithms and the competitive ratio of them is $\frac{1 - ε}{2}$ and $\frac{1 - ε}{4}$ respectively. At the same time, our algorithms run faster than the original two algorithms. Given $n$ nodes in $\mathbb{R} ^ d$, we decrease the time complexity from $O(nd)$ to $\widetilde{O}(ε^{-2} \cdot (n + d))$, where for any function $f$, we use $\widetilde{O}(f)$ to denote $f \cdot \mathrm{poly}(\log f)$.

Fast and Efficient Matching Algorithm with Deadline Instances

TL;DR

This paper studies the online weighted bipartite matching problem with deadlines and uses a sketching matrix to approximate edge weights, reducing edge-weight computation from

. It introduces FastGreedy and FastPostponedGreedy, achieving competitive ratios

and

, with space

and per-operation time

, by applying a Johnson-Lindenstrauss sketch to distance-based edge weights. Empirical results on four real-world datasets show 10–20x speedups while maintaining total matching values close to the original algorithms, validating practical viability for large-scale, high-dimensional data. The framework enables efficient deadline-aware matching in large-scale systems and suggests extending sketching techniques to other variants of online matching and related optimization problems.

Abstract

(the longest time a node can be matched) into account. In this paper, we introduce a market model with

first. Next, we present our two optimized algorithms (\textsc{FastGreedy} and \textsc{FastPostponedGreedy}) and offer theoretical proof of the time complexity and correctness of our algorithms. In \textsc{FastGreedy} algorithm, we have already known if a node is a buyer or a seller. But in \textsc{FastPostponedGreedy} algorithm, the status of each node is unknown at first. Then, we generalize a sketching matrix to run the original and our algorithms on both real data sets and synthetic data sets. Let

denote the relative error of the real weight of each edge. The competitive ratio of original \textsc{Greedy} and \textsc{PostponedGreedy} is

and

respectively. Based on these two original algorithms, we proposed \textsc{FastGreedy} and \textsc{FastPostponedGreedy} algorithms and the competitive ratio of them is

and

respectively. At the same time, our algorithms run faster than the original two algorithms. Given

nodes in

, we decrease the time complexity from

, where for any function

, we use

to denote

Paper Structure (26 sections, 28 theorems, 9 equations, 23 figures, 2 tables, 5 algorithms)

This paper contains 26 sections, 28 theorems, 9 equations, 23 figures, 2 tables, 5 algorithms.

Introduction
Related Work
Online Weighted Bipartite Matching.
Fast Algorithm via Data Structure.
Roadmap.
Preliminaries
Notation.
Model
Useful Lemma
Algorithm
Experiments
Conclusion
Roadmap.
Missing Proofs
Proof of Lemma \ref{['lem:standard_greedy_correctness']}
...and 11 more sections

Key Result

Lemma 2.5

For any $X \subset \mathbb{R}^d$ of size $n$, there exists an embedding $f: \mathbb{R}^d \to \mathbb{R}^s$ where $s = O(\epsilon^{-2} \log n)$ such that $(1-\epsilon) \cdot \| x - y \|_2 \leq \| f(x) - f(y) \|_2 \leq (1+\epsilon)\cdot \| x - y \|_2$, where $x, y \in X$.

Figures (23)

Figure 1: The relationship between running time and parameter $s$ and $dl$ on GECRS data set. The parameters are defined as follows: $n$ is the node count, $d$ is the original node dimension, $s$ is the dimension after transformation, and $\mathrm{dl}$ is the maximum matching time per node (referred to as the deadline). Here GECRS denotes gene expression cancer RNA-Seq Data Set. PGreedy denotes Postponed Greedy.
Figure 2: The relationship between running time and parameter $s$ and $dl$ on Arcene data set. The parameters are defined as follows: $n$ is the node count, $d$ is the original node dimension, $s$ is the dimension after transformation, and $\mathrm{dl}$ is the maximum matching time per node (referred to as the deadline).
Figure 3: The relationship between running time and parameter $s$ and $dl$ on ARBT data set. The parameters are defined as follows: $n$ is the node count, $d$ is the original node dimension, $s$ is the dimension after transformation, and $\mathrm{dl}$ is the maximum matching time per node (referred to as the deadline). Let ARBT denote a study of Asian Religious and Biblical Texts Data Set.
Figure 4: The relationship between running time and parameter $s$ and $dl$ on REJAFADA data set. The parameters are defined as follows: $n$ is the node count, $d$ is the original node dimension, $s$ is the dimension after transformation, and $\mathrm{dl}$ is the maximum matching time per node (referred to as the deadline).
Figure 5: Comparison between the running time of each two algorithms.
...and 18 more figures

Theorems & Definitions (47)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Lemma 2.5: JL Lemma, jl84
Lemma 3.1: Restatement of Lemma \ref{['lem:standard_greedy_correctness_formal']}
Lemma 3.2: Restatement of Lemma \ref{['lem:calculate_the_competive_ratio_formal']}
Lemma 3.3: Restatement of Lemma \ref{['lem:standard_postponed_greedy_correctness_formal']}
Theorem 3.4
proof
...and 37 more

Fast and Efficient Matching Algorithm with Deadline Instances

TL;DR

Abstract

Fast and Efficient Matching Algorithm with Deadline Instances

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (47)