A Survey on Locality Sensitive Hashing Algorithms and their Applications
Omid Jafari, Preeti Maurya, Parth Nagarkar, Khandker Mushfiqul Islam, Chidambaram Crushev
TL;DR
This survey targets the ANN problem in high-dimensional spaces, arguing for Locality Sensitive Hashing as a scalable, theory-backed approach. It classifies LSH techniques by distance metrics, surveys numerous improvements (multi-probe, data-dependent, kernelized, Bayesian), and details distributed frameworks that boost throughput. A major contribution is the comprehensive catalog of applications across audio, image/video, security/privacy, biology, geoscience, graphs, time series, healthcare, and robotics, illustrating LSH’s practical impact. The paper thus provides both methodological guidance and domain-specific workflows to implement efficient, approximate nearest neighbor search in real-world settings.
Abstract
Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many diverse application domains. Locality Sensitive Hashing (LSH) is one of the most popular techniques for finding approximate nearest neighbor searches in high-dimensional spaces. The main benefits of LSH are its sub-linear query performance and theoretical guarantees on the query accuracy. In this survey paper, we provide a review of state-of-the-art LSH and Distributed LSH techniques. Most importantly, unlike any other prior survey, we present how Locality Sensitive Hashing is utilized in different application domains.
