REWA: A General Theory of Witness-Based Similarity

Nikit Phadke

REWA: A General Theory of Witness-Based Similarity

Nikit Phadke

TL;DR

This work introduces REWA, a universal theory of witness-based similarity that reframes diverse similarity methods as functional witness projections over monoids. It proves an $O(\log N)$ encoding with ranking preservation under a $\Delta$-gap, leveraging 4-wise independent hashing and monotone witness aggregation. The authors instantiate REWA across Boolean, Natural, Real, and Tropical domains, connecting Bloom filters/LSH, Count-Min Sketch, Random Fourier Features, and graph-based shortest-paths with a single unifying mechanism. They also develop compositional, multi-channel encodings for hybrid search and discuss practical failure modes and defenses, positioning REWA as a foundation for future multi-modal retrieval systems.

Abstract

We present a universal framework for similarity-preserving encodings that subsumes all discrete, continuous, algebraic, and learned similarity methods under a single theoretical umbrella. By formulating similarity as functional witness projection over monoids, we prove that \[ O\!\left(\frac{1}{Δ^{2}}\log N\right) \] encoding complexity with ranking preservation holds for arbitrary algebraic structures. This unification reveals that Bloom filters, Locality Sensitive Hashing (LSH), Count-Min sketches, Random Fourier Features, and Transformer attention kernels are instances of the same underlying mechanism. We provide complete proofs with explicit constants under 4-wise independent hashing, handle heavy-tailed witnesses via normalization and clipping, and prove \[ O(\log N) \] complexity for all major similarity methods from 1970-2024. We give explicit constructions for Boolean, Natural, Real, Tropical, and Product monoids, prove tight concentration bounds, and demonstrate compositional properties enabling multi-primitive similarity systems.

REWA: A General Theory of Witness-Based Similarity

TL;DR

This work introduces REWA, a universal theory of witness-based similarity that reframes diverse similarity methods as functional witness projections over monoids. It proves an

encoding with ranking preservation under a

-gap, leveraging 4-wise independent hashing and monotone witness aggregation. The authors instantiate REWA across Boolean, Natural, Real, and Tropical domains, connecting Bloom filters/LSH, Count-Min Sketch, Random Fourier Features, and graph-based shortest-paths with a single unifying mechanism. They also develop compositional, multi-channel encodings for hybrid search and discuss practical failure modes and defenses, positioning REWA as a foundation for future multi-modal retrieval systems.

Abstract

encoding complexity with ranking preservation holds for arbitrary algebraic structures. This unification reveals that Bloom filters, Locality Sensitive Hashing (LSH), Count-Min sketches, Random Fourier Features, and Transformer attention kernels are instances of the same underlying mechanism. We provide complete proofs with explicit constants under 4-wise independent hashing, handle heavy-tailed witnesses via normalization and clipping, and prove

complexity for all major similarity methods from 1970-2024. We give explicit constructions for Boolean, Natural, Real, Tropical, and Product monoids, prove tight concentration bounds, and demonstrate compositional properties enabling multi-primitive similarity systems.

REWA: A General Theory of Witness-Based Similarity

TL;DR

Abstract

REWA: A General Theory of Witness-Based Similarity

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (10)