TETRIS: Composing FHE Techniques for Private Functional Exploration Over Large Datasets
Malika Izabachène, Jean-Philippe Bossuat
TL;DR
The paper addresses private functional exploration over large, sensitive datasets by allowing a scientist to query a data server without revealing the data or the function. It introduces TETRIS, a practical system that composes homomorphic operations with approximate FHE (CKKS) and large-domain function evaluation to privately compute per-entry scores and then privately aggregate results via two thresholds, all while keeping the function private and the data encrypted from the scientist. Key innovations include ring repacking, ring merging, and scheme switching to enable bootstrapping for large-domain evaluations, plus a large-domain private function framework inspired by PoPETS that keeps the data in the plaintext domain and the function encrypted. The implementation on a synthetic dataset of $p=2^{19}$ entries with $h=16$ features demonstrates amortized processing about $1.9$ ms per entry on a single thread, with an open-source Lattigo-based pipeline and comparisons showing favorable performance versus sending encrypted databases or UC-based PFE. These results advance privacy-preserving data exploration in large medical datasets and can extend to partitioned databases and other domains requiring private function evaluation over large volumes of data.
Abstract
To derive valuable insights from statistics, machine learning applications frequently analyze substantial amounts of data. In this work, we address the problem of designing efficient secure techniques to probe large datasets which allow a scientist to conduct large-scale medical studies over specific attributes of patients' records, while maintaining the privacy of his model. We introduce a set of composable homomorphic operations and show how to combine private functions evaluation with private thresholds via approximate fully homomorphic encryption. This allows us to design a new system named TETRIS, which solves the real-world use case of private functional exploration of large databases, where the statistical criteria remain private to the server owning the patients' records. Our experiments show that TETRIS achieves practical performance over a large dataset of patients even for the evaluation of elaborate statements composed of linear and nonlinear functions. It is possible to extract private insights from a database of hundreds of thousands of patient records within only a few minutes on a single thread, with an amortized time per database entry smaller than 2ms.
