Terminal Embeddings in Sublinear Time
Yeshwanth Cherapanamjeri, Jelani Nelson
TL;DR
The paper tackles computing terminal embeddings between Euclidean spaces in sublinear time relative to the number of terminals, addressing the bottleneck of previous SDP-based methods. It develops a data-structure–driven approach built from a fixed-scale violator detector, a multi-scale reduction via partition trees, and an adaptive nearest-neighbor framework, complemented by a Median–JL technique to decouple dimension from data size. The main result is a terminal embedding into dimension $k=O(\varepsilon^{-2}\log n)$ with a Monte Carlo procedure that computes $f(q)$ for any query $q$ in time $O^*(n^{1-\Theta(\varepsilon^{2})}+d)$ and space $O^*(nd)$, robust to adaptive querying. This enables fast, adaptivity-safe similarity queries in high dimensions and broadens the applicability of low-dimensional embeddings to dynamic query regimes, with potential impact on adaptive nearest-neighbor search and related geometric data-structure problems.
Abstract
Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $ρ\ge 1$ if $ρ$ is the smallest value such that there exists a constant $C>0$ satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C ρd_X(x, q) . \end{equation*} When $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+ε$ is achievable via such a terminal embedding with $m = O(ε^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside of prior work is that evaluating their embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $Θ(n)$ constraints in~$m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $O^* (n^{1-Θ(ε^2)} + d)$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.
