Table of Contents
Fetching ...

TETRIS: Composing FHE Techniques for Private Functional Exploration Over Large Datasets

Malika Izabachène, Jean-Philippe Bossuat

TL;DR

The paper addresses private functional exploration over large, sensitive datasets by allowing a scientist to query a data server without revealing the data or the function. It introduces TETRIS, a practical system that composes homomorphic operations with approximate FHE (CKKS) and large-domain function evaluation to privately compute per-entry scores and then privately aggregate results via two thresholds, all while keeping the function private and the data encrypted from the scientist. Key innovations include ring repacking, ring merging, and scheme switching to enable bootstrapping for large-domain evaluations, plus a large-domain private function framework inspired by PoPETS that keeps the data in the plaintext domain and the function encrypted. The implementation on a synthetic dataset of $p=2^{19}$ entries with $h=16$ features demonstrates amortized processing about $1.9$ ms per entry on a single thread, with an open-source Lattigo-based pipeline and comparisons showing favorable performance versus sending encrypted databases or UC-based PFE. These results advance privacy-preserving data exploration in large medical datasets and can extend to partitioned databases and other domains requiring private function evaluation over large volumes of data.

Abstract

To derive valuable insights from statistics, machine learning applications frequently analyze substantial amounts of data. In this work, we address the problem of designing efficient secure techniques to probe large datasets which allow a scientist to conduct large-scale medical studies over specific attributes of patients' records, while maintaining the privacy of his model. We introduce a set of composable homomorphic operations and show how to combine private functions evaluation with private thresholds via approximate fully homomorphic encryption. This allows us to design a new system named TETRIS, which solves the real-world use case of private functional exploration of large databases, where the statistical criteria remain private to the server owning the patients' records. Our experiments show that TETRIS achieves practical performance over a large dataset of patients even for the evaluation of elaborate statements composed of linear and nonlinear functions. It is possible to extract private insights from a database of hundreds of thousands of patient records within only a few minutes on a single thread, with an amortized time per database entry smaller than 2ms.

TETRIS: Composing FHE Techniques for Private Functional Exploration Over Large Datasets

TL;DR

The paper addresses private functional exploration over large, sensitive datasets by allowing a scientist to query a data server without revealing the data or the function. It introduces TETRIS, a practical system that composes homomorphic operations with approximate FHE (CKKS) and large-domain function evaluation to privately compute per-entry scores and then privately aggregate results via two thresholds, all while keeping the function private and the data encrypted from the scientist. Key innovations include ring repacking, ring merging, and scheme switching to enable bootstrapping for large-domain evaluations, plus a large-domain private function framework inspired by PoPETS that keeps the data in the plaintext domain and the function encrypted. The implementation on a synthetic dataset of entries with features demonstrates amortized processing about ms per entry on a single thread, with an open-source Lattigo-based pipeline and comparisons showing favorable performance versus sending encrypted databases or UC-based PFE. These results advance privacy-preserving data exploration in large medical datasets and can extend to partitioned databases and other domains requiring private function evaluation over large volumes of data.

Abstract

To derive valuable insights from statistics, machine learning applications frequently analyze substantial amounts of data. In this work, we address the problem of designing efficient secure techniques to probe large datasets which allow a scientist to conduct large-scale medical studies over specific attributes of patients' records, while maintaining the privacy of his model. We introduce a set of composable homomorphic operations and show how to combine private functions evaluation with private thresholds via approximate fully homomorphic encryption. This allows us to design a new system named TETRIS, which solves the real-world use case of private functional exploration of large databases, where the statistical criteria remain private to the server owning the patients' records. Our experiments show that TETRIS achieves practical performance over a large dataset of patients even for the evaluation of elaborate statements composed of linear and nonlinear functions. It is possible to extract private insights from a database of hundreds of thousands of patient records within only a few minutes on a single thread, with an amortized time per database entry smaller than 2ms.

Paper Structure

This paper contains 59 sections, 1 theorem, 21 equations, 2 figures, 4 tables, 2 algorithms.

Key Result

Lemma 1

Let $c_1 = (a_1,b_1), \dots, c_n = (a_N,b_N)$ be encryption of $m_1(X), \cdots, m_N(X)$. Algorithm alg:repacking takes as inputs the $c_i$ and returns an encryption of $\mu(X)=\sum_i \mu_i\cdot X^i$ such that $\mu_i=m_i[0]$.

Figures (2)

  • Figure 1: Illustration of the homomorphic private function evaluation (over selected attributes) phase in TETRIS. At the beginning, the scientist holds the attributes-selection matrix $M$ (in blue) and the scoring functions definitions $f_j$, $j\in[0,m-1]$, and the database owner holds the patients database $P$ (colored in green) represented as a matrix of size $p \times h$. The scientist first sends the attributes-selection matrix $M$ defined in the plaintext domain and the $m$ attribute scoring functions $f_j$ sent encrypted as a polynomial vector $\mathsf{RLWE}(\bm{u}_{f_{j}})$ (encrypted $\mathsf{testv}$ colored in pink). For the sake of readability, each encrypted polynomial $\mathsf{RLWE}(\bm{u}_{f_{j}})$ is denoted $[\bm{u}_{f_{j}}]$ in the figure. The results of each scoring function are aggregated together to form an encrypted score for each patient, denoted as $[\texttt{score}_{\tiny{P[j}]}]$, for $j\in [0,p-1]$ (colored in pink). These intermediate scoring functions are then packed into an encrypted score over a batch of $\# \text{patients}/2^{12}$ (colored in pink).
  • Figure 2: Illustration of the homomorphic thresholds evaluation phase in TETRIS. This phase is processed on the database owner side. At the beginning, he holds a set of $\#\text{patients}/2^{12}$ ciphertexts which are merged into a ciphertext in a small ring of dimension $N=2^{12}$ and switched to a larger ring of $2^{16}$ before entering into the CKKS bootstrapping. The output is two ciphertexts, each encrypting a vector of $N/2=2^{15}$ encoded scores for the $\#\text{patients}/2^{16}$ patients. Then the database owner evaluates a first local threshold, denoted as $\mathsf{thres1}\text{-}\mathsf{p}$ in the figure and aggregates all the encrypted results via homomorphic summation. The database owner then applies the second private global threshold, denoted as $\mathsf{thres1}\text{-}\mathsf{p}$ in the figure and outputs the encrypted result.

Theorems & Definitions (1)

  • Lemma 1