Table of Contents
Fetching ...

Statistical inference of a ranked community in a directed graph

Dmitriy Kunisky, Daniel A. Spielman, Alexander S. Wein, Xifan Yu

TL;DR

This work analyzes the planted ranked subgraph (PRS) model in directed graphs, where a secretly ranked subset $S$ with permutation $\pi$ induces ordered directions while overall edge density remains uniform. The authors establish a full suite of detection and recovery thresholds, differentiating statistical feasibility from computational feasibility using a low-degree polynomial framework in the log-density setting and examining two extreme parameter regimes with $p=1$. They introduce Ranking By Wins as a simple, near-optimal recovery method for the planted ranking, with sharp guarantees and alignment properties, and show spectral methods can be suboptimal for detection but competitive for recovery in certain regimes. The results reveal both statistical-computational gaps and regimes where efficient algorithms achieve optimal or near-optimal performance, enriching understanding of hierarchical structure in directed networks. The findings connect planted-ranking problems to low-degree conjectures and spiked matrix models, offering precise phase diagrams and practical algorithms for identifying ranked communities in large digraphs.

Abstract

We study the problem of detecting or recovering a planted ranked subgraph from a directed graph, an analog for directed graphs of the well-studied planted dense subgraph model. We suppose that, among a set of $n$ items, there is a subset $S$ of $k$ items having a latent ranking in the form of a permutation $π$ of $S$, and that we observe a fraction $p$ of pairwise orderings between elements of $\{1, \dots, n\}$ which agree with $π$ with probability $\frac{1}{2} + q$ between elements of $S$ and otherwise are uniformly random. Unlike in the planted dense subgraph and planted clique problems where the community $S$ is distinguished by its unusual density of edges, here the community is only distinguished by the unusual consistency of its pairwise orderings. We establish computational and statistical thresholds for both detecting and recovering such a ranked community. In the log-density setting where $k$, $p$, and $q$ all scale as powers of $n$, we establish the exact thresholds in the associated exponents at which detection and recovery become statistically and computationally feasible. These regimes include a rich variety of behaviors, exhibiting both statistical-computational and detection-recovery gaps. We also give finer-grained results for two extreme cases: (1) $p = 1$, $k = n$, and $q$ small, where a full tournament is observed that is weakly correlated with a global ranking, and (2) $p = 1$, $q = \frac{1}{2}$, and $k$ small, where a small "ordered clique" (totally ordered directed subgraph) is planted in a random tournament.

Statistical inference of a ranked community in a directed graph

TL;DR

This work analyzes the planted ranked subgraph (PRS) model in directed graphs, where a secretly ranked subset with permutation induces ordered directions while overall edge density remains uniform. The authors establish a full suite of detection and recovery thresholds, differentiating statistical feasibility from computational feasibility using a low-degree polynomial framework in the log-density setting and examining two extreme parameter regimes with . They introduce Ranking By Wins as a simple, near-optimal recovery method for the planted ranking, with sharp guarantees and alignment properties, and show spectral methods can be suboptimal for detection but competitive for recovery in certain regimes. The results reveal both statistical-computational gaps and regimes where efficient algorithms achieve optimal or near-optimal performance, enriching understanding of hierarchical structure in directed networks. The findings connect planted-ranking problems to low-degree conjectures and spiked matrix models, offering precise phase diagrams and practical algorithms for identifying ranked communities in large digraphs.

Abstract

We study the problem of detecting or recovering a planted ranked subgraph from a directed graph, an analog for directed graphs of the well-studied planted dense subgraph model. We suppose that, among a set of items, there is a subset of items having a latent ranking in the form of a permutation of , and that we observe a fraction of pairwise orderings between elements of which agree with with probability between elements of and otherwise are uniformly random. Unlike in the planted dense subgraph and planted clique problems where the community is distinguished by its unusual density of edges, here the community is only distinguished by the unusual consistency of its pairwise orderings. We establish computational and statistical thresholds for both detecting and recovering such a ranked community. In the log-density setting where , , and all scale as powers of , we establish the exact thresholds in the associated exponents at which detection and recovery become statistically and computationally feasible. These regimes include a rich variety of behaviors, exhibiting both statistical-computational and detection-recovery gaps. We also give finer-grained results for two extreme cases: (1) , , and small, where a full tournament is observed that is weakly correlated with a global ranking, and (2) , , and small, where a small "ordered clique" (totally ordered directed subgraph) is planted in a random tournament.

Paper Structure

This paper contains 51 sections, 39 theorems, 234 equations, 4 figures.

Key Result

Theorem 1.5

The following hold:

Figures (4)

  • Figure 1: Computational and statistical thresholds for detection and recovery in the planted ranked subgraph model in the log-density setting. The green, yellow, and red regions indicate where each problem is computationally tractable, computationally hard but statistically tractable, and statistically impossible, respectively.
  • Figure 2: All possible patterns in which two paths of length $2$ can intersect non-trivially.
  • Figure 3: A description of the spectral algorithm analyzed in Theorem \ref{['thm:log-density-comp-rec']}.
  • Figure 4: A description of the spectral algorithm analyzed in Theorem \ref{['thm:planted-ordered-clique-rec']}.

Theorems & Definitions (77)

  • Definition 1.1: Strong and weak detection
  • Definition 1.2: Hamming distance
  • Definition 1.3: Kendall tau distance
  • Definition 1.4: Exact, strong, and weak recovery
  • Theorem 1.5: Computational detection in log-density setting
  • Theorem 1.6: Statistical detection in log-density setting
  • Theorem 1.7: Computational recovery in log-density setting
  • Theorem 1.8: Statistical recovery in log-density setting
  • Theorem 1.9: Detection of global ranking through tournament
  • Theorem 1.10: Spectral detection thresholds
  • ...and 67 more