Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization
Zahir Alsulaimawi
TL;DR
The paper tackles the problem of balancing data utility and privacy in analytics by casting privacy-utility as a mutual-information optimization and proposing three algorithms: Noise-Infusion for high-dimensional data, a Variational Autoencoder (VAE) approach, and an EM-based method. It contributes theoretical results, including information bounds and convergence guarantees for alternating optimization, and a practical framework integrating variational inference with EM to improve optimization. Empirical validation on Modified MNIST, CelebrityA, and a structured dataset demonstrates superior privacy protection with retained utility relative to basic privacy techniques. The work offers a flexible, theory-grounded pathway for deploying privacy-preserving analytics across diverse data types and applications.
Abstract
This study develops a novel framework for privacy-preserving data analytics, addressing the critical challenge of balancing data utility with privacy concerns. We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction while masking sensitive attributes and an Expectation Maximization (EM) approach optimized for structured data privacy. Applied to datasets such as Modified MNIST and CelebrityA, our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy. Our experimental results confirm that these approaches achieve superior privacy protection and retain high utility, making them viable for practical applications where both aspects are crucial. The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types and establishing new benchmarks for utility and confidentiality in data analytics.
