Table of Contents
Fetching ...

Terminal Embeddings in Sublinear Time

Yeshwanth Cherapanamjeri, Jelani Nelson

TL;DR

The paper tackles computing terminal embeddings between Euclidean spaces in sublinear time relative to the number of terminals, addressing the bottleneck of previous SDP-based methods. It develops a data-structure–driven approach built from a fixed-scale violator detector, a multi-scale reduction via partition trees, and an adaptive nearest-neighbor framework, complemented by a Median–JL technique to decouple dimension from data size. The main result is a terminal embedding into dimension $k=O(\varepsilon^{-2}\log n)$ with a Monte Carlo procedure that computes $f(q)$ for any query $q$ in time $O^*(n^{1-\Theta(\varepsilon^{2})}+d)$ and space $O^*(nd)$, robust to adaptive querying. This enables fast, adaptivity-safe similarity queries in high dimensions and broadens the applicability of low-dimensional embeddings to dynamic query regimes, with potential impact on adaptive nearest-neighbor search and related geometric data-structure problems.

Abstract

Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $ρ\ge 1$ if $ρ$ is the smallest value such that there exists a constant $C>0$ satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C ρd_X(x, q) . \end{equation*} When $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+ε$ is achievable via such a terminal embedding with $m = O(ε^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside of prior work is that evaluating their embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $Θ(n)$ constraints in~$m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $O^* (n^{1-Θ(ε^2)} + d)$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.

Terminal Embeddings in Sublinear Time

TL;DR

The paper tackles computing terminal embeddings between Euclidean spaces in sublinear time relative to the number of terminals, addressing the bottleneck of previous SDP-based methods. It develops a data-structure–driven approach built from a fixed-scale violator detector, a multi-scale reduction via partition trees, and an adaptive nearest-neighbor framework, complemented by a Median–JL technique to decouple dimension from data size. The main result is a terminal embedding into dimension with a Monte Carlo procedure that computes for any query in time and space , robust to adaptive querying. This enables fast, adaptivity-safe similarity queries in high dimensions and broadens the applicability of low-dimensional embeddings to dynamic query regimes, with potential impact on adaptive nearest-neighbor search and related geometric data-structure problems.

Abstract

Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space to another with a set of designated terminals . Such an embedding is said to have distortion if is the smallest value such that there exists a constant satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C ρd_X(x, q) . \end{equation*} When are both Euclidean metrics with being -dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion is achievable via such a terminal embedding with for . This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within and not to from the rest of space. The downside of prior work is that evaluating their embedding on some required solving a semidefinite program with constraints in~ variables and thus required some superlinear runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process to obtain an almost linear-space data structure that supports computing the terminal embedding image of any in sublinear time . To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.

Paper Structure

This paper contains 23 sections, 28 theorems, 108 equations, 9 algorithms.

Key Result

theorem 1

Let $\varepsilon \in (0, 1)$, $\rho_1, \rho_2, \rho_3, \rho_4, \rho_{\mathrm{rep}} > 0$. Then, there is a randomized procedure which when instantiated with a dataset $X = \{x_i\}_{i = 1}^n \subset \mathbb{R}^d$, a $(\rho_3, \rho_4)$-Approximate Partitioning data structure, a $(\rho_1, \rho_2, (1 + \ with probability at least $1 - 1 / \text{\rm poly} (n)$ over the randomness during the instantiati

Theorems & Definitions (65)

  • remark 1
  • definition 1: Terminal Embedding
  • definition 2: Outer Extension
  • definition 3
  • definition 4
  • definition 5
  • definition 6
  • definition 7
  • definition 8
  • theorem 1
  • ...and 55 more