Table of Contents
Fetching ...

SPML: A DSL for Defending Language Models Against Prompt Attacks

Reshabh K Sharma, Vinayak Gupta, Dan Grossman

TL;DR

System Prompt Meta Language (SPML) is presented, a domain-specific language for refining prompts and monitoring the inputs to the LLM-based chatbots, and introduces a groundbreaking benchmark, offering the inaugural language and benchmark for chatbot definition evaluation.

Abstract

Large language models (LLMs) have profoundly transformed natural language applications, with a growing reliance on instruction-based definitions for designing chatbots. However, post-deployment the chatbot definitions are fixed and are vulnerable to attacks by malicious users, emphasizing the need to prevent unethical applications and financial losses. Existing studies explore user prompts' impact on LLM-based chatbots, yet practical methods to contain attacks on application-specific chatbots remain unexplored. This paper presents System Prompt Meta Language (SPML), a domain-specific language for refining prompts and monitoring the inputs to the LLM-based chatbots. SPML actively checks attack prompts, ensuring user inputs align with chatbot definitions to prevent malicious execution on the LLM backbone, optimizing costs. It also streamlines chatbot definition crafting with programming language capabilities, overcoming natural language design challenges. Additionally, we introduce a groundbreaking benchmark with 1.8k system prompts and 20k user inputs, offering the inaugural language and benchmark for chatbot definition evaluation. Experiments across datasets demonstrate SPML's proficiency in understanding attacker prompts, surpassing models like GPT-4, GPT-3.5, and LLAMA. Our data and codes are publicly available at: https://prompt-compiler.github.io/SPML/.

SPML: A DSL for Defending Language Models Against Prompt Attacks

TL;DR

System Prompt Meta Language (SPML) is presented, a domain-specific language for refining prompts and monitoring the inputs to the LLM-based chatbots, and introduces a groundbreaking benchmark, offering the inaugural language and benchmark for chatbot definition evaluation.

Abstract

Large language models (LLMs) have profoundly transformed natural language applications, with a growing reliance on instruction-based definitions for designing chatbots. However, post-deployment the chatbot definitions are fixed and are vulnerable to attacks by malicious users, emphasizing the need to prevent unethical applications and financial losses. Existing studies explore user prompts' impact on LLM-based chatbots, yet practical methods to contain attacks on application-specific chatbots remain unexplored. This paper presents System Prompt Meta Language (SPML), a domain-specific language for refining prompts and monitoring the inputs to the LLM-based chatbots. SPML actively checks attack prompts, ensuring user inputs align with chatbot definitions to prevent malicious execution on the LLM backbone, optimizing costs. It also streamlines chatbot definition crafting with programming language capabilities, overcoming natural language design challenges. Additionally, we introduce a groundbreaking benchmark with 1.8k system prompts and 20k user inputs, offering the inaugural language and benchmark for chatbot definition evaluation. Experiments across datasets demonstrate SPML's proficiency in understanding attacker prompts, surpassing models like GPT-4, GPT-3.5, and LLAMA. Our data and codes are publicly available at: https://prompt-compiler.github.io/SPML/.
Paper Structure (57 sections, 5 figures, 1 table)

This paper contains 57 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Illustrative example of a user user engaging with a chatbot operating on the LLM backbone, while SPML diligently monitors user inputs for any potential malicious prompts. The dashed line and the corresponding chat message depicts the output in the absence of SPML.
  • Figure 2: Overview of the SPML Compilation and Monitoring Pipeline for Prompt Injection Detection
  • Figure 3: An end-to-end example involves a data entry in our dataset. Each entry comprises an intermediate presentation for a specific prompt, providing a structured definition of the characteristics within the prompt. It also includes a set of user prompts with labels indicating whether they are safe or from an attacker. The dataset additionally contains details about the intermediate representation of the user prompts, which is utilized to determine whether the user is an attacker or not.
  • Figure 4: Performance of GPT models and SPML in detecting intrusion attacks across different levels of system prompt violations.
  • Figure 5: Performance of SPML and GPT models across different values of temperature parameters. We report the performance on a smaller subset of examples, where we observed the most randomness. This plot is to show that across different temperatures, due to the prompt language ability of SPML, it achieves consistent performance across all settings.