Scalable Private Search with Wally
Hilal Asi, Fabian Boemer, Nicholas Genise, Muhammad Haris Mughees, Tabitha Ogilvie, Rehan Rishi, Kunal Talwar, Karl Tarbe, Akshay Wadia, Ruiyu Zhu, Marco Zuliani
TL;DR
Wally tackles private search at large scale by relaxing to $(\epsilon,\delta)$-differential privacy and leveraging epoch-based batching, fake queries, anonymized routing, and somewhat homomorphic encryption to reduce server computation and data transfer. The approach clusters embeddings, selects a subset of clusters per query, and uses DP to hide which clusters are queried, while encrypting embeddings and performing encrypted dot-products to retrieve results. Key contributions include a formal DP security proof in the central model, a standalone open-source BFV-based SHE library with optimizations, and extensive evaluations showing 7–28x higher QPS and 6.69–31x lower bandwidth than Tiptoe on a 3.2M-entry MSMARCO-like dataset, plus acceptable MRR@100 performance. The work demonstrates that at-scale private search can be practical for real-world search workloads, with overheads that diminish as user participation increases, making privacy-preserving search more deployable in large online ecosystems.
Abstract
This paper presents Wally, a private search system that supports efficient search queries against large databases. When sufficiently many clients are making queries, Wally's performance is significantly better than previous systems while providing a standard privacy guarantee of $(ε, δ)$-differential privacy. Specifically, for a database with 3.2 million entries, Wally's queries per second (QPS) is 7-28x higher, and communication is 6.69-31x smaller than Tiptoe, a state-of-the-art private search system. In Wally, each client adds a few fake queries and sends each query via an anonymous network to the server at independently chosen random instants. We also use somewhat homomorphic encryption (SHE) to reduce the communication size. The number of fake queries each client makes depends inversely on the number of clients making queries. Therefore, the overhead of fake queries vanishes as the number of honest clients increases, enabling scalability to millions of queries and large databases.
