Table of Contents
Fetching ...

A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks

Adam Swanda, Amy Chang, Alexander Chen, Fraser Burch, Paul Kassianik, Konstantin Berlin

TL;DR

The paper tackles the vulnerability of Large Language Models to evolving attacks, including zero-days, by proposing a production-grade defense platform that enables rapid, end-to-end protection. It introduces a triad architecture comprising a Threat Intelligence Platform (IntelOps) for rapid threat-to-signature translation, a Data Platform for multi-source data correlation and labeling, and a Release Platform for immutable, multi-version deployments with shadow testing. The main contributions are: (1) an automated threat intelligence operations workflow with prioritized, attack-to-signature translations; (2) a unified data-correlation framework with activity-wide release gating and golden-label production; (3) an immutable multi-version deployment model that supports safe, rapid updates and rollbacks; and (4) an end-to-end rapid-response pipeline from threat ingestion to production deployment. The approach aims to shrink the OODA loop, delivering practical, enterprise-scale protection against both known and novel LLM threats and providing a blueprint for robust AI security in production environments.

Abstract

The widespread adoption of Large Language Models (LLMs) has revolutionized AI deployment, enabling autonomous and semi-autonomous applications across industries through intuitive language interfaces and continuous improvements in model development. However, the attendant increase in autonomy and expansion of access permissions among AI applications also make these systems compelling targets for malicious attacks. Their inherent susceptibility to security flaws necessitates robust defenses, yet no known approaches can prevent zero-day or novel attacks against LLMs. This places AI protection systems in a category similar to established malware protection systems: rather than providing guaranteed immunity, they minimize risk through enhanced observability, multi-layered defense, and rapid threat response, supported by a threat intelligence function designed specifically for AI-related threats. Prior work on LLM protection has largely evaluated individual detection models rather than end-to-end systems designed for continuous, rapid adaptation to a changing threat landscape. We present a production-grade defense system rooted in established malware detection and threat intelligence practices. Our platform integrates three components: a threat intelligence system that turns emerging threats into protections; a data platform that aggregates and enriches information while providing observability, monitoring, and ML operations; and a release platform enabling safe, rapid detection updates without disrupting customer workflows. Together, these components deliver layered protection against evolving LLM threats while generating training data for continuous model improvement and deploying updates without interrupting production.

A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks

TL;DR

The paper tackles the vulnerability of Large Language Models to evolving attacks, including zero-days, by proposing a production-grade defense platform that enables rapid, end-to-end protection. It introduces a triad architecture comprising a Threat Intelligence Platform (IntelOps) for rapid threat-to-signature translation, a Data Platform for multi-source data correlation and labeling, and a Release Platform for immutable, multi-version deployments with shadow testing. The main contributions are: (1) an automated threat intelligence operations workflow with prioritized, attack-to-signature translations; (2) a unified data-correlation framework with activity-wide release gating and golden-label production; (3) an immutable multi-version deployment model that supports safe, rapid updates and rollbacks; and (4) an end-to-end rapid-response pipeline from threat ingestion to production deployment. The approach aims to shrink the OODA loop, delivering practical, enterprise-scale protection against both known and novel LLM threats and providing a blueprint for robust AI security in production environments.

Abstract

The widespread adoption of Large Language Models (LLMs) has revolutionized AI deployment, enabling autonomous and semi-autonomous applications across industries through intuitive language interfaces and continuous improvements in model development. However, the attendant increase in autonomy and expansion of access permissions among AI applications also make these systems compelling targets for malicious attacks. Their inherent susceptibility to security flaws necessitates robust defenses, yet no known approaches can prevent zero-day or novel attacks against LLMs. This places AI protection systems in a category similar to established malware protection systems: rather than providing guaranteed immunity, they minimize risk through enhanced observability, multi-layered defense, and rapid threat response, supported by a threat intelligence function designed specifically for AI-related threats. Prior work on LLM protection has largely evaluated individual detection models rather than end-to-end systems designed for continuous, rapid adaptation to a changing threat landscape. We present a production-grade defense system rooted in established malware detection and threat intelligence practices. Our platform integrates three components: a threat intelligence system that turns emerging threats into protections; a data platform that aggregates and enriches information while providing observability, monitoring, and ML operations; and a release platform enabling safe, rapid detection updates without disrupting customer workflows. Together, these components deliver layered protection against evolving LLM threats while generating training data for continuous model improvement and deploying updates without interrupting production.

Paper Structure

This paper contains 17 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: Rapid response system architecture showing the end-to-end flow from threat intelligence ingestion to production deployment. Raw intelligence feeds the Threat Intelligence Platform, which generates detection signatures and training and validation data in the Data Platform, culminating in safe deployment through the Release Platform.
  • Figure 2: Threat Intelligence Platform architecture showing automated collection from multiple sources (OSINT, academic research, internal findings), followed by prioritization scoring, human analyst review, and conversion to actionable protections through signature development and attack dataset generation.
  • Figure 3: Screenshot of IntelOps queue front-end that includes date of ingestion, title of source, affected models, TTPs, attack success rates, and analyst triage status, with additional filtering capabilities
  • Figure 4: Data Platform
  • Figure 5: Release Platform architecture demonstrating immutable component deployment with concurrent versioning. The central orchestrator routes customer requests to appropriate guardrail versions, enabling seamless shadow deployments, gradual rollouts, and instant rollbacks while guaranteeing that already deployed guardrails cannot be accidentally disrupted during updates.