FHAIM: Fully Homomorphic AIM For Private Synthetic Data Generation
Mayank Kumar, Qian Lou, Paulo Barreto, Martine De Cock, Sikha Pentyala
TL;DR
FHAIM tackles input privacy in outsourced synthetic data generation by training a marginal-based generator directly on encrypted tabular data using a CKKS-based fully homomorphic encryption scheme. It realizes DP-in-FHE through novel protocols for marginal computation, encrypted DP noise injection, and encrypted query selection, employing a squared $L_2$ quality score to stabilize the selection step. Empirical results on real datasets demonstrate that FHAIM preserves the utility of the AIM baseline under $(\varepsilon,\delta)$-DP while delivering practical runtimes (approximately 11–30 minutes) and modest memory usage, thereby enabling private SDG as a service in privacy-sensitive domains. This work shows that input privacy and formal DP guarantees can be achieved without multi-party coordination, paving the way for scalable privacy-preserving data sharing in healthcare, finance, and beyond.
Abstract
Data is the lifeblood of AI, yet much of the most valuable data remains locked in silos due to privacy and regulations. As a result, AI remains heavily underutilized in many of the most important domains, including healthcare, education, and finance. Synthetic data generation (SDG), i.e. the generation of artificial data with a synthesizer trained on real data, offers an appealing solution to make data available while mitigating privacy concerns, however existing SDG-as-a-service workflow require data holders to trust providers with access to private data.We propose FHAIM, the first fully homomorphic encryption (FHE) framework for training a marginal-based synthetic data generator on encrypted tabular data. FHAIM adapts the widely used AIM algorithm to the FHE setting using novel FHE protocols, ensuring that the private data remains encrypted throughout and is released only with differential privacy guarantees. Our empirical analysis show that FHAIM preserves the performance of AIM while maintaining feasible runtimes.
