Building a Privacy Web with SPIDEr -- Secure Pipeline for Information De-Identification with End-to-End Encryption
Novoneel Chakraborty, Anshoo Tandon, Kailash Reddy, Kaushal Kirpekar, Bryan Paul Robert, Hari Dilip Kumar, Abhilash Venkatesh, Abhay Sharma
TL;DR
SPIDEr addresses secure, privacy-preserving data de-identification in cloud settings by combining TEEs with end-to-end encryption to keep unencrypted data inaccessible to service providers. It fuses classical de-identification techniques (suppression, generalisation, pseudonymisation) with formal privacy guarantees (k-anonymity and differential privacy) and optimizes DP workloads through batch processing on constrained TEEs. The system enforces secure end-to-end control flows, including attestation via the Azure Attestation service and a multi-component enclave-management/authentication stack, plus Docker-based cloud deployment. By enabling explicit privacy-utility tradeoffs through parameters like $\varepsilon$ and leveraging the sensitivity $\Delta_f$, SPIDEr provides practical, scalable protection for data releases while reducing risk of leakage and enabling compliant analytics.
Abstract
Data de-identification makes it possible to glean insights from data while preserving user privacy. The use of Trusted Execution Environments (TEEs) allow for the execution of de-identification applications on the cloud without the need for a user to trust the third-party application provider. In this paper, we present \textit{SPIDEr - Secure Pipeline for Information De-Identification with End-to-End Encryption}, our implementation of an end-to-end encrypted data de-identification pipeline. SPIDEr supports classical anonymisation techniques such as suppression, pseudonymisation, generalisation, and aggregation, as well as techniques that offer a formal privacy guarantee such as k-anonymisation and differential privacy. To enable scalability and improve performance on constrained TEE hardware, we enable batch processing of data for differential privacy computations. We present our design of the control flows for end-to-end secure execution of de-identification operations within a TEE. As part of the control flow for running SPIDEr within the TEE, we perform attestation, a process that verifies that the software binaries were properly instantiated on a known, trusted platform.
