Table of Contents
Fetching ...

Pair-Matching: Links Prediction with Adaptive Queries

Christophe Giraud, Yann Issartel, Luc Lehéricy, Matthieu Lerasle

TL;DR

This work analyzes the sequential pair-matching problem under a two-class assortative SBM, where the goal is to reveal as many edges as possible within a query budget while limiting per-node sampling. By framing pair-matching as a constrained, structured bandit problem and leveraging clustering results for SBMs, the authors derive tight sublinear regret bounds and design a polynomial-time algorithm that achieves the optimal rate up to constants. A key finding is a phase transition governed by the KS threshold, with regret scaling as $T ext{∧} rac{ ext{√T} ext{∨} (T/B_T)}{s}$, and a clear degradation under sparse sampling ($B_T$ small). The paper also extends to multiple communities, discusses potential information-computation gaps, and validates the theory with extensive numerical experiments, including estimation of the scaling parameter $s$ and robustness to misspecification. Overall, this work provides a principled framework for adaptive link prediction under structural graph priors and practical sampling constraints, with implications for efficient partner matching, network science, and active learning in networks.

Abstract

The pair-matching problem appears in many applications where one wants to discover good matches between pairs of entities or individuals. Formally, the set of individuals is represented by the nodes of a graph where the edges, unobserved at first, represent the good matches. The algorithm queries pairs of nodes and observes the presence/absence of edges. Its goal is to discover as many edges as possible with a fixed budget of queries. Pair-matching is a particular instance of multi-armed bandit problem in which the arms are pairs of individuals and the rewards are edges linking these pairs. This bandit problem is non-standard though, as each arm can only be played once. Given this last constraint, sublinear regret can be expected only if the graph presents some underlying structure. This paper shows that sublinear regret is achievable in the case where the graph is generated according to a Stochastic Block Model (SBM) with two communities. Optimal regret bounds are computed for this pair-matching problem. They exhibit a phase transition related to the Kesten-Stigum threshold for community detection in SBM. The pair-matching problem is considered in the case where each node is constrained to be sampled less than a given amount of times. We show how optimal regret rates depend on this constraint. The paper is concluded by a conjecture regarding the optimal regret when the number of communities is larger than 2. Contrary to the two communities case, we argue that a statistical-computational gap would appear in this problem.

Pair-Matching: Links Prediction with Adaptive Queries

TL;DR

This work analyzes the sequential pair-matching problem under a two-class assortative SBM, where the goal is to reveal as many edges as possible within a query budget while limiting per-node sampling. By framing pair-matching as a constrained, structured bandit problem and leveraging clustering results for SBMs, the authors derive tight sublinear regret bounds and design a polynomial-time algorithm that achieves the optimal rate up to constants. A key finding is a phase transition governed by the KS threshold, with regret scaling as , and a clear degradation under sparse sampling ( small). The paper also extends to multiple communities, discusses potential information-computation gaps, and validates the theory with extensive numerical experiments, including estimation of the scaling parameter and robustness to misspecification. Overall, this work provides a principled framework for adaptive link prediction under structural graph priors and practical sampling constraints, with implications for efficient partner matching, network science, and active learning in networks.

Abstract

The pair-matching problem appears in many applications where one wants to discover good matches between pairs of entities or individuals. Formally, the set of individuals is represented by the nodes of a graph where the edges, unobserved at first, represent the good matches. The algorithm queries pairs of nodes and observes the presence/absence of edges. Its goal is to discover as many edges as possible with a fixed budget of queries. Pair-matching is a particular instance of multi-armed bandit problem in which the arms are pairs of individuals and the rewards are edges linking these pairs. This bandit problem is non-standard though, as each arm can only be played once. Given this last constraint, sublinear regret can be expected only if the graph presents some underlying structure. This paper shows that sublinear regret is achievable in the case where the graph is generated according to a Stochastic Block Model (SBM) with two communities. Optimal regret bounds are computed for this pair-matching problem. They exhibit a phase transition related to the Kesten-Stigum threshold for community detection in SBM. The pair-matching problem is considered in the case where each node is constrained to be sampled less than a given amount of times. We show how optimal regret rates depend on this constraint. The paper is concluded by a conjecture regarding the optimal regret when the number of communities is larger than 2. Contrary to the two communities case, we argue that a statistical-computational gap would appear in this problem.

Paper Structure

This paper contains 65 sections, 20 theorems, 192 equations, 5 figures.

Key Result

Theorem 1

Let $T$ and $n$ be positive integers with $T \leq |\mathcal{E}^{\text{good}}|=2\binom{n/2}{2}$. Let $p,q \in [0,1/2]$ be two parameters fulfilling (eq:pq) and such that where the scaling parameter $s$ is defined in eq:def:ScalingParam. Then, for any $\mu \in \text{cSBM}(n/2,n/2,p,q)$, Moreover, there exist two numerical constants $c_{1},c_{2}>0$, and a strategy $\psi \in \Psi_\infty$ correspondin

Figures (5)

  • Figure 1: Average number of errors $\bar{N}^\text{bad}$ over 10 simulations in the unconstrained case with balanced communities. The graphs show $\log(\bar{N}^\text{bad})$ (left) and $\log(s \bar{N}^\text{bad})$ (right) as a function of $\log(T)$, confirming the two regimes (linear in $T$ and proportional to $\sqrt{T}/s$). The lines have slope 1 and $1/2$ respectively. Green: $s=0.01$, black: $s=0.04$, blue: $s=0.06666...$, red: $s=0.16$.
  • Figure 2: Average number of errors $\bar{N}^\text{bad}$ over 10 simulations in the unconstrained case with a 80%-20% split between communities. The graphs show $\log(\bar{N}^\text{bad})$ (left) and $\log(s \bar{N}^\text{bad})$ (right) as a function of $\log(T)$. The lines have slope 1 and $1/2$ respectively. Green: $s=0.01$, black: $s=0.04$, blue: $s=0.06666...$, red: $s=0.16$.
  • Figure 3: Average number of errors $\bar{N}^\text{bad}$ over 10 simulations in the constrained case with balanced communities. The graphs show $\log(s \bar{N}^\text{bad})$ as a function of $\log(T)$. The lines have slope $1/2$ and $1$ respectively. This confirms the three regimes of Theorem \ref{['thm:contraint']}: linear in $T$ for small and large $T$ and proportional to $\sqrt{T}/s$ in between. Black: $s=0.04$, green: $s=0.06666...$, red: $s=0.16$.
  • Figure 4: On the left, ratio of the true scaling parameter $s$ to the estimated scaling parameter $\hat{s}$ as defined in Section \ref{['sec:SNR']}. The red line corresponds to $s/\hat{s} = 1$. On the right, product $s\hat{N}$ where $\hat{N}$ is the number of nodes at the stopping time. The red line corresponds to $s\hat{N} = 2$.
  • Figure 5: Logarithm of the average regret $\frac{1}{10} \sum_{k=1}^{10} \left( pT - \sum_{t=1}^T A_{\widehat{e}_t}^{(k)} \right)$ in the misspecified case, as a function of $\log(T)$. The lines have slope $1/2$ and $1$ respectively. Black: $\sigma=0$, red: $\sigma=0.1$, green: $\sigma=0.2$, blue: $\sigma=0.3$, cyan: $\sigma=0.4$, violet: $\sigma=0.5$. The two regimes from the well specified case (linear then proportional to $\sqrt{T}$) are still visible.

Theorems & Definitions (21)

  • Theorem 1
  • Theorem 2
  • Conjecture 1
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Theorem 7
  • Lemma 8
  • Lemma 9
  • ...and 11 more