Table of Contents
Fetching ...

Locality Sensitive Hashing in Hyperbolic Space

Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin

Abstract

For a metric space $(X, d)$, a family $\mathcal{H}$ of locality sensitive hash functions is called $(r, cr, p_1, p_2)$ sensitive if a randomly chosen function $h\in \mathcal{H}$ has probability at least $p_1$ (at most $p_2$) to map any $a, b\in X$ in the same hash bucket if $d(a, b)\leq r$ (or $d(a, b)\geq cr$). Locality Sensitive Hashing (LSH) is one of the most popular techniques for approximate nearest-neighbor search in high-dimensional spaces, and has been studied extensively for Hamming, Euclidean, and spherical geometries. An $(r, cr, p_1, p_2)$-sensitive hash function enables approximate nearest neighbor search (i.e., returning a point within distance $cr$ from a query $q$ if there exists a point within distance $r$ from $q$) with space $O(n^{1+ρ})$ and query time $O(n^ρ)$ where $ρ=\frac{\log 1/p_1}{\log 1/p_2}$. But LSH for hyperbolic spaces $\mathbb{H}^d$ remains largely unexplored. In this work, we present the first LSH construction native to hyperbolic space. For the hyperbolic plane $(d=2)$, we show a construction achieving $ρ\leq 1/c$, based on the hyperplane rounding scheme. For general hyperbolic spaces $(d \geq 3)$, we use dimension reduction from $\mathbb{H}^d$ to $\mathbb{H}^2$ and the 2D hyperbolic LSH to get $ρ\leq 1.59/c$. On the lower bound side, we show that the lower bound on $ρ$ of Euclidean LSH extends to the hyperbolic setting via local isometry, therefore giving $ρ\geq 1/c^2$.

Locality Sensitive Hashing in Hyperbolic Space

Abstract

For a metric space , a family of locality sensitive hash functions is called sensitive if a randomly chosen function has probability at least (at most ) to map any in the same hash bucket if (or ). Locality Sensitive Hashing (LSH) is one of the most popular techniques for approximate nearest-neighbor search in high-dimensional spaces, and has been studied extensively for Hamming, Euclidean, and spherical geometries. An -sensitive hash function enables approximate nearest neighbor search (i.e., returning a point within distance from a query if there exists a point within distance from ) with space and query time where . But LSH for hyperbolic spaces remains largely unexplored. In this work, we present the first LSH construction native to hyperbolic space. For the hyperbolic plane , we show a construction achieving , based on the hyperplane rounding scheme. For general hyperbolic spaces , we use dimension reduction from to and the 2D hyperbolic LSH to get . On the lower bound side, we show that the lower bound on of Euclidean LSH extends to the hyperbolic setting via local isometry, therefore giving .
Paper Structure (12 sections, 13 theorems, 22 equations, 7 figures)

This paper contains 12 sections, 13 theorems, 22 equations, 7 figures.

Key Result

Proposition 3

Given an $(r,cr,p_1,p_2)$-sensitive family $\mathcal{H}$, the $(c, r)$-ANNS problem can be solved using $O(n^{1+\rho})$ space, and $O(n^\rho\log n)$ query time, where $\rho = \rho (\mathcal{H})= \frac{\log(1/p_1)}{\log (1/p_2)}$.

Figures (7)

  • Figure 1: Illustration of the hashing scheme in $\mathbb H^2$. If two points are far, then they are more likely to be separated by a random geodesic (the green arc). Points in blue are assigned label $+1$ and points in red are assigned label $-1$.
  • Figure 3: Average $p_1, p_2$ of Hyperbolic LSH with varying $c$ for different dimensions ($y$-values do not start at 0)
  • Figure 4: Average $\rho$ of Hyperbolic LSH with varying $c$ for different dimensions ($y$-values do not start at 0)
  • Figure 5: Average $\rho$ and average $p_1, p_2$ in high-dimension when $c$ is small ($y$-values do not start at 0)
  • Figure 6: Average $\rho$ and average $p_1, p_2$ in high dimension and $R = .9$ in Euclidean coordinates ($y$-values do not start at 0)
  • ...and 2 more figures

Theorems & Definitions (16)

  • Definition 1: Locality Sensitive Hashing
  • Definition 2: $(c, r)$-approximate Nearest Neighbor Search Problem
  • Proposition 3
  • Theorem 4
  • Theorem 5: Informal Version of \ref{['thm:JLproof']}
  • Theorem 6
  • Theorem 7
  • Lemma 8
  • Lemma 10: Lemma 1 datar2004locality
  • Definition 11
  • ...and 6 more