Table of Contents
Fetching ...

BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints

Waris Gill, Natalie Isak, Matthew Dressman

TL;DR

BinaryShield transforms suspicious prompts through a unique pipeline combining PII redaction, semantic embedding, binary quantization, and randomized response mechanism to potentially generate privacy-preserving fingerprints that preserve attack patterns while providing privacy.

Abstract

The widespread deployment of LLMs across enterprise services has created a critical security blind spot. Organizations operate multiple LLM services handling billions of queries daily, yet regulatory compliance boundaries prevent these services from sharing threat intelligence about prompt injection attacks, the top security risk for LLMs. When an attack is detected in one service, the same threat may persist undetected in others for months, as privacy regulations prohibit sharing user prompts across compliance boundaries. We present BinaryShield, \emph{the first privacy-preserving threat intelligence system that enables secure sharing of attack fingerprints across compliance boundaries.} BinaryShield transforms suspicious prompts through a unique pipeline combining PII redaction, semantic embedding, binary quantization, and randomized response mechanism to potentially generate privacy-preserving fingerprints that preserve attack patterns while providing privacy. Our evaluations demonstrate that BinaryShield achieves an F1-score of 0.94, significantly outperforming SimHash (0.77), the privacy-preserving baseline, while achieving storage reduction and 38x faster similarity search compared to dense embeddings.

BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints

TL;DR

BinaryShield transforms suspicious prompts through a unique pipeline combining PII redaction, semantic embedding, binary quantization, and randomized response mechanism to potentially generate privacy-preserving fingerprints that preserve attack patterns while providing privacy.

Abstract

The widespread deployment of LLMs across enterprise services has created a critical security blind spot. Organizations operate multiple LLM services handling billions of queries daily, yet regulatory compliance boundaries prevent these services from sharing threat intelligence about prompt injection attacks, the top security risk for LLMs. When an attack is detected in one service, the same threat may persist undetected in others for months, as privacy regulations prohibit sharing user prompts across compliance boundaries. We present BinaryShield, \emph{the first privacy-preserving threat intelligence system that enables secure sharing of attack fingerprints across compliance boundaries.} BinaryShield transforms suspicious prompts through a unique pipeline combining PII redaction, semantic embedding, binary quantization, and randomized response mechanism to potentially generate privacy-preserving fingerprints that preserve attack patterns while providing privacy. Our evaluations demonstrate that BinaryShield achieves an F1-score of 0.94, significantly outperforming SimHash (0.77), the privacy-preserving baseline, while achieving storage reduction and 38x faster similarity search compared to dense embeddings.

Paper Structure

This paper contains 26 sections, 6 equations, 14 figures, 1 table, 1 algorithm.

Figures (14)

  • Figure 1: BinaryShield system design. Suspicious prompts are processed within the compliance boundary to generate privacy-preserving fingerprints, which are then shared across services for collaborative threat detection.
  • Figure 2: BinaryShield approach for privacy-preserving fingerprint generation for cross-service threat intelligence sharing. The pipeline transforms potential prompt injections through PII redaction, semantic embedding, binary quantization, and differential privacy to produce shareable fingerprints that preserve privacy while enabling threat correlation across compliance boundaries.
  • Figure 3: Distribution of PII entities in prompts flagged by BinaryShield. Most common: person names.
  • Figure 4: Impact of BinaryShield's privacy parameter ($\alpha$). As $\alpha$ increases, privatized vector (red triangles) move closer to original vector, showing improved utility while maintaining differential privacy through controlled bit-flipping (i.e., $1-p$) noise.
  • Figure 5: BinaryShield performance analysis across attack variants with $\alpha = 2.0$. (a) Precision-recall curves demonstrate consistent performance across variants. (b) Confusion matrices at optimal Hamming distance thresholds show TP, TN, FP, and FN counts, indicating strong detection capabilities even under complex paraphrasing attacks. Note that PI in the confusion matrices stands for Prompt Injection and is used to distinguish between attack and benign prompts.
  • ...and 9 more figures