Table of Contents
Fetching ...

Femur: A Flexible Framework for Fast and Secure Querying from Public Key-Value Store

Jiaoyi Zhang, Liqiang Peng, Mo Sha, Weiran Liu, Xiang Li, Sheng Wang, Feifei Li, Mingyu Gao, Huanchen Zhang

TL;DR

Femur addresses the challenge of privately querying large public key-value stores without incurring the prohibitive costs of full PIR. It introduces distance-based indistinguishability to relax privacy guarantees in a controlled, provable manner and combines a learned index (PGM-index) with an offline initialization and online query workflow. The framework supports two retrieval modes, plaintext download and a novel variable-range PIR, and uses a lightweight cost model to adaptively select the optimal scheme per query. Empirical results on a 200M-record dataset show substantial speedups over state-of-the-art PIR systems, with up to 163.9X gains under relaxed privacy and a realistic offline initialization time, demonstrating practical scalability for real-world deployments like Redis integrations.

Abstract

With increasing demands for privacy, it becomes necessary to protect sensitive user query data when accessing public key-value databases. Existing Private Information Retrieval (PIR) schemes provide full security but suffer from poor scalability, limiting their applicability in large-scale deployment. We argue that in many real-world scenarios, a more practical solution should allow users to flexibly determine the privacy levels of their queries in a theoretically guided way, balancing security and performance based on specific needs. To formally provide provable guarantees, we introduce a novel concept of distance-based indistinguishability, which can facilitate users to comfortably relax their security requirements. We then design Femur, an efficient framework to securely query public key-value stores with flexible security and performance trade-offs. It uses a space-efficient learned index to convert query keys into storage locations, obfuscates these locations with extra noise provably derived by the distance-based indistinguishability theory, and sends the expanded range to the server. The server then adaptively utilizes the best scheme to retrieve data. We also propose a novel variable-range PIR scheme optimized for bandwidth-constrained environments. Experiments show that Femur outperforms the state-of-the-art designs even when ensuring the same full security level. When users are willing to relax their privacy requirements, Femur can further improve the performance gains to up to 163.9X, demonstrating an effective trade-off between security and performance.

Femur: A Flexible Framework for Fast and Secure Querying from Public Key-Value Store

TL;DR

Femur addresses the challenge of privately querying large public key-value stores without incurring the prohibitive costs of full PIR. It introduces distance-based indistinguishability to relax privacy guarantees in a controlled, provable manner and combines a learned index (PGM-index) with an offline initialization and online query workflow. The framework supports two retrieval modes, plaintext download and a novel variable-range PIR, and uses a lightweight cost model to adaptively select the optimal scheme per query. Empirical results on a 200M-record dataset show substantial speedups over state-of-the-art PIR systems, with up to 163.9X gains under relaxed privacy and a realistic offline initialization time, demonstrating practical scalability for real-world deployments like Redis integrations.

Abstract

With increasing demands for privacy, it becomes necessary to protect sensitive user query data when accessing public key-value databases. Existing Private Information Retrieval (PIR) schemes provide full security but suffer from poor scalability, limiting their applicability in large-scale deployment. We argue that in many real-world scenarios, a more practical solution should allow users to flexibly determine the privacy levels of their queries in a theoretically guided way, balancing security and performance based on specific needs. To formally provide provable guarantees, we introduce a novel concept of distance-based indistinguishability, which can facilitate users to comfortably relax their security requirements. We then design Femur, an efficient framework to securely query public key-value stores with flexible security and performance trade-offs. It uses a space-efficient learned index to convert query keys into storage locations, obfuscates these locations with extra noise provably derived by the distance-based indistinguishability theory, and sends the expanded range to the server. The server then adaptively utilizes the best scheme to retrieve data. We also propose a novel variable-range PIR scheme optimized for bandwidth-constrained environments. Experiments show that Femur outperforms the state-of-the-art designs even when ensuring the same full security level. When users are willing to relax their privacy requirements, Femur can further improve the performance gains to up to 163.9X, demonstrating an effective trade-off between security and performance.

Paper Structure

This paper contains 44 sections, 3 theorems, 6 equations, 9 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{M}:\mathbb{D}\rightarrow \mathbb{O}$ be an $\epsilon$-dLDP obfuscation mechanism, and $f: \mathbb{O} \rightarrow \mathbb{O}'$ be any randomized function. Then, $f \circ \mathcal{M}$ remains $\epsilon$-dLDP.

Figures (9)

  • Figure 1: The Core Components of Femur Framework.
  • Figure 2: An Example of the PGM-index.
  • Figure 3: An Example of Data Encoding and Query Processing in Variable-Range PIR.
  • Figure 4: Total Online Execution Time for 100 Queries -- Femur uses 8 relaxed security levels besides full security. Pantheon and Chalamet encountered out-of-memory issues on 8 threads, so their results are ideally scaled from single-thread time. Pantheon failed to complete on this large dataset after running 24 hours using a single thread, so we mark its (lower-bound) time as 24/8 = 3 hours.
  • Figure 5: Online Execution Time for Each Query on Different Dataset Sizes -- The number above each bar represents the speedup over Chalamet. Pantheon timed out on datasets of size $2^{24}$ and $2^{26}$. The shadowed area represents the server-side computation time, including the time required to serialize the data to be sent.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Definition 1: Local Differential Privacy, LDP
  • Definition 2: Distance-based Local Differential Privacy
  • Theorem 1: Post-Processing dwork2014algorithmic
  • Theorem 2
  • Definition 3: Discrete Laplace Distribution
  • Theorem 3