Table of Contents
Fetching ...

Privacy preserving layer partitioning for Deep Neural Network models

Kishore Rajasekar, Randolph Loh, Kar Wai Fok, Vrizlynn L. L. Thing

TL;DR

The paper addresses private inference in MLaaS environments by leveraging a layer-partitioning approach that runs the initial, sensitive layers inside an SGX enclave and offloads the rest to a GPU to improve performance. It introduces a privacy evaluation using c-GAN reconstruction attacks to identify optimal partition points for VGG-16, ResNet-50, and EfficientNetB0 on ImageNet and TON_IoT image datasets, reporting substantial speedups while maintaining privacy constraints. The key contributions include a detailed runtime analysis across multiple architectures, a privacy assessment framework using c-GANs, and practical guidance on partition points that balance speed and reconstruct-ability for real-world deployments. This work has practical impact for secure, efficient cloud-based inference by quantifying model-dependent tradeoffs and demonstrating viable privacy-preserving configurations with TEEs and accelerator offloading.

Abstract

MLaaS (Machine Learning as a Service) has become popular in the cloud computing domain, allowing users to leverage cloud resources for running private inference of ML models on their data. However, ensuring user input privacy and secure inference execution is essential. One of the approaches to protect data privacy and integrity is to use Trusted Execution Environments (TEEs) by enabling execution of programs in secure hardware enclave. Using TEEs can introduce significant performance overhead due to the additional layers of encryption, decryption, security and integrity checks. This can lead to slower inference times compared to running on unprotected hardware. In our work, we enhance the runtime performance of ML models by introducing layer partitioning technique and offloading computations to GPU. The technique comprises two distinct partitions: one executed within the TEE, and the other carried out using a GPU accelerator. Layer partitioning exposes intermediate feature maps in the clear which can lead to reconstruction attacks to recover the input. We conduct experiments to demonstrate the effectiveness of our approach in protecting against input reconstruction attacks developed using trained conditional Generative Adversarial Network(c-GAN). The evaluation is performed on widely used models such as VGG-16, ResNet-50, and EfficientNetB0, using two datasets: ImageNet for Image classification and TON IoT dataset for cybersecurity attack detection.

Privacy preserving layer partitioning for Deep Neural Network models

TL;DR

The paper addresses private inference in MLaaS environments by leveraging a layer-partitioning approach that runs the initial, sensitive layers inside an SGX enclave and offloads the rest to a GPU to improve performance. It introduces a privacy evaluation using c-GAN reconstruction attacks to identify optimal partition points for VGG-16, ResNet-50, and EfficientNetB0 on ImageNet and TON_IoT image datasets, reporting substantial speedups while maintaining privacy constraints. The key contributions include a detailed runtime analysis across multiple architectures, a privacy assessment framework using c-GANs, and practical guidance on partition points that balance speed and reconstruct-ability for real-world deployments. This work has practical impact for secure, efficient cloud-based inference by quantifying model-dependent tradeoffs and demonstrating viable privacy-preserving configurations with TEEs and accelerator offloading.

Abstract

MLaaS (Machine Learning as a Service) has become popular in the cloud computing domain, allowing users to leverage cloud resources for running private inference of ML models on their data. However, ensuring user input privacy and secure inference execution is essential. One of the approaches to protect data privacy and integrity is to use Trusted Execution Environments (TEEs) by enabling execution of programs in secure hardware enclave. Using TEEs can introduce significant performance overhead due to the additional layers of encryption, decryption, security and integrity checks. This can lead to slower inference times compared to running on unprotected hardware. In our work, we enhance the runtime performance of ML models by introducing layer partitioning technique and offloading computations to GPU. The technique comprises two distinct partitions: one executed within the TEE, and the other carried out using a GPU accelerator. Layer partitioning exposes intermediate feature maps in the clear which can lead to reconstruction attacks to recover the input. We conduct experiments to demonstrate the effectiveness of our approach in protecting against input reconstruction attacks developed using trained conditional Generative Adversarial Network(c-GAN). The evaluation is performed on widely used models such as VGG-16, ResNet-50, and EfficientNetB0, using two datasets: ImageNet for Image classification and TON IoT dataset for cybersecurity attack detection.
Paper Structure (31 sections, 8 figures, 3 tables)

This paper contains 31 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Secure system framework to perform private inference
  • Figure 2: Model architectures
  • Figure 3: Average inference runtime of DNN models for different layer partitions.
  • Figure 4: Reconstructed images from intermediate feature maps of different layer partitions in VGG-16
  • Figure 5: Reconstructed images from intermediate feature maps of different layer partitions in ResNet-50.
  • ...and 3 more figures