Table of Contents
Fetching ...

Styx: Collaborative and Private Data Processing With TEE-Enforced Sticky Policy

Shixuan Zhao, Weicheng Wang, Ninghui Li, Zhiqiang Lin

Abstract

Protecting sensitive information in data-driven collaborations, such as AI training, while meeting the diverse requirements of multiple mutually distrusted stakeholders, is both crucial and challenging. This paper presents Styx, a novel framework to address this challenge by integrating sticky policies with Trusted Execution Environments (TEEs). At a high level, Styx employs a hardware-TEE-protected middleware with a programming language runtime to form a sandboxed environment for both the data processing and policy enforcement. We carefully designed a data processing workflow and pipelines to enable a strong yet flexible data-specific policy enforcement throughout the entire data lifecycle and data derivation to achieve data-in-use protection, data lifecycle protection and dynamic collaboration. We implemented Styx and demonstrated its ability to make collaborative computing, such as joint AI training, more secure, privacy-preserving, and policy-compliant. Our evaluation shows the performance overheads imposed by Styx are reasonable on single-node computation with the capability to scale to a large distributed multi-node deployment.

Styx: Collaborative and Private Data Processing With TEE-Enforced Sticky Policy

Abstract

Protecting sensitive information in data-driven collaborations, such as AI training, while meeting the diverse requirements of multiple mutually distrusted stakeholders, is both crucial and challenging. This paper presents Styx, a novel framework to address this challenge by integrating sticky policies with Trusted Execution Environments (TEEs). At a high level, Styx employs a hardware-TEE-protected middleware with a programming language runtime to form a sandboxed environment for both the data processing and policy enforcement. We carefully designed a data processing workflow and pipelines to enable a strong yet flexible data-specific policy enforcement throughout the entire data lifecycle and data derivation to achieve data-in-use protection, data lifecycle protection and dynamic collaboration. We implemented Styx and demonstrated its ability to make collaborative computing, such as joint AI training, more secure, privacy-preserving, and policy-compliant. Our evaluation shows the performance overheads imposed by Styx are reasonable on single-node computation with the capability to scale to a large distributed multi-node deployment.

Paper Structure

This paper contains 35 sections, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: A motivating scenario that hospitals jointly train a cancer classification model, fine-tune the model, and query the model.
  • Figure 2: A BNF specification for Pad. $\langle\hbox{encrypted-payload}\rangle$ is obtained from aes-encrypt$_{\hbox{datakey}}$($\langle\hbox{plaintext-payload}\rangle$).
  • Figure 3: The workflow of the Styx.
  • Figure 4: Data and policy derivation workflow. Green background means trusted domain. Gray background means untrusted domain (sandboxes or outside of TEE).
  • Figure 5: The format of Pad in our implementation. The left side shows the view of a Pad before decryption.
  • ...and 5 more figures