Table of Contents
Fetching ...

Finding the root in random nearest neighbor trees

Anna Brandenberger, Cassandra Marcussen, Elchanan Mossel, Madhu Sudan

TL;DR

It is shown that there exist efficient root finding algorithms for embedded and metric root finding, and upper and lower bounds are derived on the size of the confidence set for embedded root finding.

Abstract

We study the inference of network archaeology in growing random geometric graphs. We consider the root finding problem for a random nearest neighbor tree in dimension $d \in \mathbb{N}$, generated by sequentially embedding vertices uniformly at random in the $d$-dimensional torus and connecting each new vertex to the nearest existing vertex. More precisely, given an error parameter $\varepsilon > 0$ and the unlabeled tree, we want to efficiently find a small set of candidate vertices, such that the root is included in this set with probability at least $1 - \varepsilon$. We call such a candidate set a $\textit{confidence set}$. We define several variations of the root finding problem in geometric settings -- embedded, metric, and graph root finding -- which differ based on the nature of the type of metric information provided in addition to the graph structure (torus embedding, edge lengths, or no additional information, respectively). We show that there exist efficient root finding algorithms for embedded and metric root finding. For embedded root finding, we derive upper and lower bounds (uniformly bounded in $n$) on the size of the confidence set: the upper bound is subpolynomial in $1/\varepsilon$ and stems from an explicit efficient algorithm, and the information-theoretic lower bound is polylogarithmic in $1/\varepsilon$. In particular, in $d=1$, we obtain matching upper and lower bounds for a confidence set of size $Θ\left(\frac{\log(1/\varepsilon)}{\log \log(1/\varepsilon)} \right)$.

Finding the root in random nearest neighbor trees

TL;DR

It is shown that there exist efficient root finding algorithms for embedded and metric root finding, and upper and lower bounds are derived on the size of the confidence set for embedded root finding.

Abstract

We study the inference of network archaeology in growing random geometric graphs. We consider the root finding problem for a random nearest neighbor tree in dimension , generated by sequentially embedding vertices uniformly at random in the -dimensional torus and connecting each new vertex to the nearest existing vertex. More precisely, given an error parameter and the unlabeled tree, we want to efficiently find a small set of candidate vertices, such that the root is included in this set with probability at least . We call such a candidate set a . We define several variations of the root finding problem in geometric settings -- embedded, metric, and graph root finding -- which differ based on the nature of the type of metric information provided in addition to the graph structure (torus embedding, edge lengths, or no additional information, respectively). We show that there exist efficient root finding algorithms for embedded and metric root finding. For embedded root finding, we derive upper and lower bounds (uniformly bounded in ) on the size of the confidence set: the upper bound is subpolynomial in and stems from an explicit efficient algorithm, and the information-theoretic lower bound is polylogarithmic in . In particular, in , we obtain matching upper and lower bounds for a confidence set of size .

Paper Structure

This paper contains 19 sections, 12 theorems, 32 equations, 3 figures.

Key Result

Theorem 1.1

There exist $c_1, c_2 > 0$ such that the following holds for the 1-NN model for all $n \in \mathbf{N}$ and all sufficiently small $\varepsilon > 0$. There exists a $O(n^2 + \log^2(1/\varepsilon))$-time embedded root finding algorithm that returns $H(\varepsilon, n)$ of size satisfying $|H(\varepsilo

Figures (3)

  • Figure 1: Our algorithm returns uncovered vertices within a graph distance of $k$ of an edge of length at least $\ell$, where $k$ and $\ell$ are defined with respect to $\varepsilon$. In this figure, the vertices highlighted in orange are added to the confidence set since they are uncovered and graph-theoretically near an edge of length at least $\ell$. In the figure, uncovered vertices and edges are green and covered vertices and edges are red.
  • Figure 2: When the thickness of the strip is small enough, we argue that when projecting to $\mathbf{T}^1$, the uncovered subgraph of $\pi(T_t)$ up to time $t = \text{poly}(1/\varepsilon)$ is consistent with the uncovered graph of a $1$-NN tree with vertices in the same locations. We can therefore use algorithms from the $d=1$ setting over the thin two-dimensional torus.
  • Figure 3: Simulation of a 2-NN tree of size 10,000. Uncovered edges are labeled as blue and covered edges are labeled as red. The first 20 vertices added are labeled by their time of arrival. Edges that wrap around the torus are dotted. The simulations indicate that the number of uncovered vertices in the tree grows slowly with the number of vertices in the tree.

Theorems & Definitions (39)

  • Definition 1.1
  • Theorem 1.1
  • Theorem 1.2
  • Remark 1
  • Lemma 2.1
  • proof
  • Lemma 2.2
  • Lemma 2.3
  • Definition 3.1: Uncovered vertices and edges
  • Definition 3.2: Remaining uncovered interval
  • ...and 29 more