Table of Contents
Fetching ...

PEPSI: Practically Efficient Private Set Intersection in the Unbalanced Setting

Rasoul Akhavan Mahdavi, Nils Lukas, Faezeh Ebrahimianghazani, Thomas Humphries, Bailey Kacsmar, John Premkumar, Xinda Li, Simon Oya, Ehsan Amjadian, Florian Kerschbaum

TL;DR

This work is the first to demonstrate that non-interactive circuit PSI can be practically applied in an unbalanced setting and is also up to 20 times faster than the work of Ion et al., which computes a limited set of functions and has communication costs proportional to the larger set.

Abstract

Two parties with private data sets can find shared elements using a Private Set Intersection (PSI) protocol without revealing any information beyond the intersection. Circuit PSI protocols privately compute an arbitrary function of the intersection - such as its cardinality, and are often employed in an unbalanced setting where one party has more data than the other. Existing protocols are either computationally inefficient or require extensive server-client communication on the order of the larger set. We introduce Practically Efficient PSI or PEPSI, a non-interactive solution where only the client sends its encrypted data. PEPSI can process an intersection of 1024 client items with a million server items in under a second, using less than 5 MB of communication. Our work is over 4 orders of magnitude faster than an existing non-interactive circuit PSI protocol and requires only 10% of the communication. It is also up to 20 times faster than the work of Ion et al., which computes a limited set of functions and has communication costs proportional to the larger set. Our work is the first to demonstrate that non-interactive circuit PSI can be practically applied in an unbalanced setting.

PEPSI: Practically Efficient Private Set Intersection in the Unbalanced Setting

TL;DR

This work is the first to demonstrate that non-interactive circuit PSI can be practically applied in an unbalanced setting and is also up to 20 times faster than the work of Ion et al., which computes a limited set of functions and has communication costs proportional to the larger set.

Abstract

Two parties with private data sets can find shared elements using a Private Set Intersection (PSI) protocol without revealing any information beyond the intersection. Circuit PSI protocols privately compute an arbitrary function of the intersection - such as its cardinality, and are often employed in an unbalanced setting where one party has more data than the other. Existing protocols are either computationally inefficient or require extensive server-client communication on the order of the larger set. We introduce Practically Efficient PSI or PEPSI, a non-interactive solution where only the client sends its encrypted data. PEPSI can process an intersection of 1024 client items with a million server items in under a second, using less than 5 MB of communication. Our work is over 4 orders of magnitude faster than an existing non-interactive circuit PSI protocol and requires only 10% of the communication. It is also up to 20 times faster than the work of Ion et al., which computes a limited set of functions and has communication costs proportional to the larger set. Our work is the first to demonstrate that non-interactive circuit PSI can be practically applied in an unbalanced setting.
Paper Structure (45 sections, 7 theorems, 11 equations, 7 figures, 7 tables, 7 algorithms)

This paper contains 45 sections, 7 theorems, 11 equations, 7 figures, 7 tables, 7 algorithms.

Key Result

Lemma 1

When hashing $m$ client elements and $n$ server elements to $\lambda$-bit strings, the probability of failure in the protocol due to collisions is upper bounded by where $b$, $\gamma$, and $\mu$ are the number of bins, maximum client bin size, and maximum server bin size, respectively.

Figures (7)

  • Figure 1: Stages of PEPSI.
  • Figure 2: Client Dataset Preprocessing in PEPSI. $d$ indicates dummy elements, $b$ is the number of bins, and $\gamma$ is the clients maximum bin load. From left to right: symbols represent the real-valued payload that is hashed into bins with a maximum bin load $\gamma$. The payload is encoded into bits using our constant-weight codewords, whereby the same color indicates the same payload, and finally encrypted into a ciphertext by batching across bins. The preprocessing outputs $\ell \cdot \gamma$ ciphertexts.
  • Figure 3: Code length as a function of the Hamming weight for $\bar{\lambda}\in\{16, 32, 48\}$ for $b\leq 4096$. The minimum occurs for a Hamming weight of 8, 8, and 23, respectively.
  • Figure 4: Hamming weight which optimizes communication and computation as a function of the effective bitlength ($\bar{\lambda}$) in blue and red, respectively. We assume $b\leq 4096$ in these graphs.
  • Figure 5: Runtime as a function of the Hamming weight for $\bar{\lambda}\in\{16, 32, 48\}$ for $b\leq 4096$. The minimum occurs for a Hamming weight of 4, 8, and 8, respectively.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Lemma 1
  • Theorem 1
  • proof
  • Corollary 1.1
  • Theorem 2
  • proof
  • Lemma 1
  • proof
  • Theorem 3
  • proof
  • ...and 2 more