Table of Contents
Fetching ...

Sublinear Time Algorithm for Online Weighted Bipartite Matching

Hang Hu, Zhao Song, Runzhou Tao, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

TL;DR

This paper tackles online weighted bipartite matching where edge weights are given by inner products of $d$-dimensional feature vectors, a setting common in recommendation and ranking systems with large item sets. It develops randomized data structures that approximate weights, enabling sublinear-time weight computations per arriving online vertex while preserving a $\frac{1}{2}$-approximation ratio via a robust greedy framework. The authors introduce a dynamic inner-product estimation structure and, for the distance and inner-product variants, derive per-arrival time bounds of $\widetilde{O}(\epsilon^{-2}(n+d)\log(n/\delta))$ and $\widetilde{O}(\epsilon^{-2}D^2(n+d)\log(n/\delta))$ respectively, with high-probability guarantees; they further improve the inner-product case using a Max-IP-based approach to achieve sublinear updates with a tunable exponent. The work demonstrates that sublinear-time online matching is feasible in a meaningful subset of the problem space, offering practical latency improvements for large-scale systems while maintaining rigorous competitive guarantees.

Abstract

Online bipartite matching is a fundamental problem in online algorithms. The goal is to match two sets of vertices to maximize the sum of the edge weights, where for one set of vertices, each vertex and its corresponding edge weights appear in a sequence. Currently, in the practical recommendation system or search engine, the weights are decided by the inner product between the deep representation of a user and the deep representation of an item. The standard online matching needs to pay $nd$ time to linear scan all the $n$ items, computing weight (assuming each representation vector has length $d$), and then deciding the matching based on the weights. However, in reality, the $n$ could be very large, e.g. in online e-commerce platforms. Thus, improving the time of computing weights is a problem of practical significance. In this work, we provide the theoretical foundation for computing the weights approximately. We show that, with our proposed randomized data structures, the weights can be computed in sublinear time while still preserving the competitive ratio of the matching algorithm.

Sublinear Time Algorithm for Online Weighted Bipartite Matching

TL;DR

This paper tackles online weighted bipartite matching where edge weights are given by inner products of -dimensional feature vectors, a setting common in recommendation and ranking systems with large item sets. It develops randomized data structures that approximate weights, enabling sublinear-time weight computations per arriving online vertex while preserving a -approximation ratio via a robust greedy framework. The authors introduce a dynamic inner-product estimation structure and, for the distance and inner-product variants, derive per-arrival time bounds of and respectively, with high-probability guarantees; they further improve the inner-product case using a Max-IP-based approach to achieve sublinear updates with a tunable exponent. The work demonstrates that sublinear-time online matching is feasible in a meaningful subset of the problem space, offering practical latency improvements for large-scale systems while maintaining rigorous competitive guarantees.

Abstract

Online bipartite matching is a fundamental problem in online algorithms. The goal is to match two sets of vertices to maximize the sum of the edge weights, where for one set of vertices, each vertex and its corresponding edge weights appear in a sequence. Currently, in the practical recommendation system or search engine, the weights are decided by the inner product between the deep representation of a user and the deep representation of an item. The standard online matching needs to pay time to linear scan all the items, computing weight (assuming each representation vector has length ), and then deciding the matching based on the weights. However, in reality, the could be very large, e.g. in online e-commerce platforms. Thus, improving the time of computing weights is a problem of practical significance. In this work, we provide the theoretical foundation for computing the weights approximately. We show that, with our proposed randomized data structures, the weights can be computed in sublinear time while still preserving the competitive ratio of the matching algorithm.
Paper Structure (36 sections, 29 theorems, 41 equations, 6 algorithms)

This paper contains 36 sections, 29 theorems, 41 equations, 6 algorithms.

Key Result

Theorem 1.2

Let $\epsilon \in (0,1), \delta \in (0,1)$. Then, there is an online bipartite matching algorithm (with $w(x,y)=\| x- y\|_2$) that takes time for each coming vertex, and our algorithm satisfies with probability at least $1-\delta$, where $n$ is the number of online points.

Theorems & Definitions (56)

  • Definition 1.1: Informal version of Definition \ref{['def:main_problem']}
  • Theorem 1.2: Main result, informal version of Theorem \ref{['thm:distance_weight']}
  • Theorem 1.3: Main result, informal version of Theorem \ref{['thm:inner_product_weight']}
  • Theorem 1.4: Main result, informal version of Theorem \ref{['thm:inner_product_weight_v2']}
  • Definition 2.1: Locality sensitive hashing, im98
  • Definition 2.2: Approximate Near Neighbor
  • Definition 2.3: Approximate Max-IP
  • Theorem 2.4: Andoni and Razenshteyn ar15
  • Theorem 2.5: Theorem 8.2, page 19, ssx21
  • Definition 2.6: Submodular function s03
  • ...and 46 more