Feasibility of Privacy-Preserving Entity Resolution on Confidential Healthcare Datasets Using Homomorphic Encryption
Yixiang Yao, Joseph Cecil, Praveen Angyan, Neil Bahroos, Srivatsan Ravi
TL;DR
This work tackles privacy-preserving entity resolution across confidential healthcare datasets under regulatory constraints. It implements a CKKS-based AMPPERE pipeline, augmented with record-based chunking, SIMD, parallel processing, and tailored parameter tuning to achieve scalable, accurate matching while keeping data encrypted. Empirical results on mortality-related datasets show near-complete blocking (PC ≈ 100%), very high reduction in candidate comparisons (RR ≈ 99.996%), and strong ER performance, with substantial runtime improvements (up to 24x faster than AMPPERE and 447x faster than naive HE). The approach demonstrates practical HIPAA/GDPR-compliant data linkage for healthcare research, enabling secure cross-institution data sharing and collective decryption of results without exposing underlying records.
Abstract
Patient datasets contain confidential information which is protected by laws and regulations such as HIPAA and GDPR. Ensuring comprehensive patient information necessitates privacy-preserving entity resolution (PPER), which identifies identical patient entities across multiple databases from different healthcare organizations while maintaining data privacy. Existing methods often lack cryptographic security or are computationally impractical for real-world datasets. We introduce a PPER pipeline based on AMPPERE, a secure abstract computation model utilizing cryptographic tools like homomorphic encryption. Our tailored approach incorporates extensive parallelization techniques and optimal parameters specifically for patient datasets. Experimental results demonstrate the proposed method's effectiveness in terms of accuracy and efficiency compared to various baselines.
