Table of Contents
Fetching ...

SentinelLMs: Encrypted Input Adaptation and Fine-tuning of Language Models for Private and Secure Inference

Abhijit Mishra, Mingda Li, Soham Deo

TL;DR

The paper tackles privacy risks in server-based language-model inference by enabling private, encrypted-input processing without expensive re-training of base models. It introduces a lightweight pipeline that adapts pre-trained transformers through passkey-based tokenizer encryption, distance-preserving embedding transformations via glide reflections, and vocabulary reindexing, followed by encrypted fine-tuning on task data. Empirically, adapted models achieve parity with their originals across GLUE, CoNLL2003, and XNLI benchmarks, while experiments demonstrate negligible token recoverability from embeddings, and higher security with more aggressive transformations. The approach offers a practical pathway to secure, private NLP inference with minimal overhead, with potential extensions to generative models and more advanced cryptographic protections.

Abstract

This paper addresses the privacy and security concerns associated with deep neural language models, which serve as crucial components in various modern AI-based applications. These models are often used after being pre-trained and fine-tuned for specific tasks, with deployment on servers accessed through the internet. However, this introduces two fundamental risks: (a) the transmission of user inputs to the server via the network gives rise to interception vulnerabilities, and (b) privacy concerns emerge as organizations that deploy such models store user data with restricted context. To address this, we propose a novel method to adapt and fine-tune transformer-based language models on passkey-encrypted user-specific text. The original pre-trained language model first undergoes a quick adaptation (without any further pre-training) with a series of irreversible transformations applied to the tokenizer and token embeddings. This enables the model to perform inference on encrypted inputs while preventing reverse engineering of text from model parameters and intermediate outputs. After adaptation, models are fine-tuned on encrypted versions of existing training datasets. Experimental evaluation employing adapted versions of renowned models (e.g., BERT, RoBERTa) across established benchmark English and multilingual datasets for text classification and sequence labeling shows that encrypted models achieve performance parity with their original counterparts. This serves to safeguard performance, privacy, and security cohesively.

SentinelLMs: Encrypted Input Adaptation and Fine-tuning of Language Models for Private and Secure Inference

TL;DR

The paper tackles privacy risks in server-based language-model inference by enabling private, encrypted-input processing without expensive re-training of base models. It introduces a lightweight pipeline that adapts pre-trained transformers through passkey-based tokenizer encryption, distance-preserving embedding transformations via glide reflections, and vocabulary reindexing, followed by encrypted fine-tuning on task data. Empirically, adapted models achieve parity with their originals across GLUE, CoNLL2003, and XNLI benchmarks, while experiments demonstrate negligible token recoverability from embeddings, and higher security with more aggressive transformations. The approach offers a practical pathway to secure, private NLP inference with minimal overhead, with potential extensions to generative models and more advanced cryptographic protections.

Abstract

This paper addresses the privacy and security concerns associated with deep neural language models, which serve as crucial components in various modern AI-based applications. These models are often used after being pre-trained and fine-tuned for specific tasks, with deployment on servers accessed through the internet. However, this introduces two fundamental risks: (a) the transmission of user inputs to the server via the network gives rise to interception vulnerabilities, and (b) privacy concerns emerge as organizations that deploy such models store user data with restricted context. To address this, we propose a novel method to adapt and fine-tune transformer-based language models on passkey-encrypted user-specific text. The original pre-trained language model first undergoes a quick adaptation (without any further pre-training) with a series of irreversible transformations applied to the tokenizer and token embeddings. This enables the model to perform inference on encrypted inputs while preventing reverse engineering of text from model parameters and intermediate outputs. After adaptation, models are fine-tuned on encrypted versions of existing training datasets. Experimental evaluation employing adapted versions of renowned models (e.g., BERT, RoBERTa) across established benchmark English and multilingual datasets for text classification and sequence labeling shows that encrypted models achieve performance parity with their original counterparts. This serves to safeguard performance, privacy, and security cohesively.
Paper Structure (16 sections, 2 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 2 equations, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: Illustration of the Workflow: User-Initiated Password-Driven Language Adaptation and Fine-Tuning Process. (a) Initial phase where a user-generated passkey initiates a one-time language adaptation. (b) Subsequent stage involving a one-time fine-tuning process. (c) Run-time scenario showcasing server-side inference on encrypted user input.
  • Figure 2: 2D plot of embeddings for 100 random tokens from the original bert-base-uncased model (in red dots) and same tokens from the transformed model after 10 iterations of glide reflection (in blue dots). The plot further illustrates that sample tokens "vocalists" and "involved" have altered positions with preserved spatial distance (dashed lines)