Privacy-Preserving Hamming Distance Computation with Property-Preserving Hashing
Dongfang Zhao
TL;DR
This work addresses securely estimating Hamming distance under property-preserving hashing (PPH), a setting where only hashed inputs are available. It advances from a baseline polylog-time approach based on threshold predicates to a refined log-time method without amplification, and finally to a constant-time distance estimation scheme that embeds distance information directly into the hash outputs. The key contributions are: (i) a binary-search-based scheme with controlled error and sublinear query complexity, (ii) a constant-repetition variant that achieves $O( ext{log} n)$ queries under structural assumptions, and (iii) a constant-time, noninteractive encoding that yields an accurate $d_H$ estimate with provable indistinguishability guarantees. Together, these results demonstrate that approximate Hamming distance can be recovered efficiently and securely from PPH encodings, bridging efficient similarity estimation with cryptographic guarantees and enabling privacy-preserving analytics at near-constant time.
Abstract
We study the problem of approximating Hamming distance in sublinear time under property-preserving hashing (PPH), where only hashed representations of inputs are available. Building on the threshold evaluation framework of Fleischhacker, Larsen, and Simkin (EUROCRYPT 2022), we present a sequence of constructions with progressively improved complexity: a baseline binary search algorithm, a refined variant with constant repetition per query, and a novel hash design that enables constant-time approximation without oracle access. Our results demonstrate that approximate distance recovery is possible under strong cryptographic guarantees, bridging efficiency and security in similarity estimation.
