$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI

Shoki Ohta; Takayuki Nishio

$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI

Shoki Ohta, Takayuki Nishio

TL;DR

This work introduces $\Lambda$-Split, a split computing framework to facilitate computational offloading while simultaneously fortifying data privacy against risks such as eavesdropping and unauthorized access, and empirically validate the efficacy of the framework using Llama 2 and Stable Diffusion XL.

Abstract

In the wake of the burgeoning expansion of generative artificial intelligence (AI) services, the computational demands inherent to these technologies frequently necessitate cloud-powered computational offloading, particularly for resource-constrained mobile devices. These services commonly employ prompts to steer the generative process, and both the prompts and the resultant content, such as text and images, may harbor privacy-sensitive or confidential information, thereby elevating security and privacy risks. To mitigate these concerns, we introduce $Λ$-Split, a split computing framework to facilitate computational offloading while simultaneously fortifying data privacy against risks such as eavesdropping and unauthorized access. In $Λ$-Split, a generative model, usually a deep neural network (DNN), is partitioned into three sub-models and distributed across the user's local device and a cloud server: the input-side and output-side sub-models are allocated to the local, while the intermediate, computationally-intensive sub-model resides on the cloud server. This architecture ensures that only the hidden layer outputs are transmitted, thereby preventing the external transmission of privacy-sensitive raw input and output data. Given the black-box nature of DNNs, estimating the original input or output from intercepted hidden layer outputs poses a significant challenge for malicious eavesdroppers. Moreover, $Λ$-Split is orthogonal to traditional encryption-based security mechanisms, offering enhanced security when deployed in conjunction. We empirically validate the efficacy of the $Λ$-Split framework using Llama 2 and Stable Diffusion XL, representative large language and diffusion models developed by Meta and Stability AI, respectively. Our $Λ$-Split implementation is publicly accessible at https://github.com/nishio-laboratory/lambda_split.

$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI

TL;DR

This work introduces

-Split, a split computing framework to facilitate computational offloading while simultaneously fortifying data privacy against risks such as eavesdropping and unauthorized access, and empirically validate the efficacy of the framework using Llama 2 and Stable Diffusion XL.

Abstract

-Split, a split computing framework to facilitate computational offloading while simultaneously fortifying data privacy against risks such as eavesdropping and unauthorized access. In

-Split, a generative model, usually a deep neural network (DNN), is partitioned into three sub-models and distributed across the user's local device and a cloud server: the input-side and output-side sub-models are allocated to the local, while the intermediate, computationally-intensive sub-model resides on the cloud server. This architecture ensures that only the hidden layer outputs are transmitted, thereby preventing the external transmission of privacy-sensitive raw input and output data. Given the black-box nature of DNNs, estimating the original input or output from intercepted hidden layer outputs poses a significant challenge for malicious eavesdroppers. Moreover,

-Split is orthogonal to traditional encryption-based security mechanisms, offering enhanced security when deployed in conjunction. We empirically validate the efficacy of the

-Split framework using Llama 2 and Stable Diffusion XL, representative large language and diffusion models developed by Meta and Stability AI, respectively. Our

-Split implementation is publicly accessible at https://github.com/nishio-laboratory/lambda_split.

Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Introduction
System model and general framework
$\Lambda$-Split for Large Language Model-based Text Generation
Splitting LLM
Communication Traffic Reduction
Experimental Evaluation
$\Lambda$-Split for Diffusion Model-based Image Generation
Splitting Diffusion Models
Communication Traffic Reduction
Experimental Evaluation
Challenges and Potential Research Opportunities
Conclusion

Figures (5)

Figure 1: Schematic representation of a generative model implemented using the cloud-only inference, standard SC matsubara:sc_survey, and $\Lambda$-Split framework. $\Lambda$-Split enhances privacy preservation because all transmitted data is the hidden output of DNN while leveraging the computational power of the cloud. The nomenclature of $\Lambda$-Split is emblematic of this data flow pattern, where the local-to-cloud-to-local fold-back trajectory resembles the Greek letter $\Lambda$.
Figure 2: Conventional cloud-only and $\Lambda$-Split for LLM inference. The $N$-layer stacked Transformer decoder blocks are split into three sub-models because the decoder blocks require large computations. LLM is split between $X-1$-th and $X$-th decoder layer, and $Y-1$-th and $Y$-th decoder layer.
Figure 3: Visualization of example eavesdropped HTTP packets without encryption using Wireshark. The lower left windows show the decoded data in the HTTP packets transmitted from the cloud to the local. In the cloud-only inference, the input prompts and generated text can be known from the eavesdropped packets. Conversely, in $\Lambda$-Split, the decoded data consists of a byte stream representing the DNN's hidden output vector, thereby obfuscating the semantic content and complicating unauthorized interpretation.
Figure 4: Conventional cloud-only and $\Lambda$-Split for LDM inference. LDM is triadically split so that the computationally expensive U-Net is located in the cloud and that most of the transmitted data is noise data.
Figure 5: Example denoising process, transmission data, and generated images in $\Lambda$-Split for LDMs. The majority of transmission data is Gaussian noise. There was a tradeoff between traffic volume and quality of generated images.

$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI

TL;DR

Abstract

$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI

Authors

TL;DR

Abstract

Table of Contents

Figures (5)