Table of Contents
Fetching ...

Assessing Prompt Injection Risks in 200+ Custom GPTs

Jiahao Yu, Yuhang Wu, Dong Shu, Mingyu Jin, Sabrina Yang, Xinyu Xing

TL;DR

This paper investigates prompt injection risks in user-customized GPTs, identifying system prompt extraction and file leakage as critical vulnerabilities. The authors propose a three-step attack methodology and validate it through large-scale testing on 200+ GPTs, achieving high success rates. They also evaluate a defense mechanism via red-teaming and reveal that current defenses can be bypassed, even with code interpreters present. The work emphasizes the need for robust, multi-faceted security measures and responsible disclosure to stakeholders to protect privacy and IP while preserving customization benefits.

Abstract

In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature - customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections. Through prompt injection, an adversary can not only extract the customized system prompts but also access the uploaded files. This paper provides a first-hand analysis of the prompt injection, alongside the evaluation of the possible mitigation of such attacks. Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models. The intent of this paper is to raise awareness and prompt action in the AI community, ensuring that the benefits of GPT customization do not come at the cost of compromised security and privacy.

Assessing Prompt Injection Risks in 200+ Custom GPTs

TL;DR

This paper investigates prompt injection risks in user-customized GPTs, identifying system prompt extraction and file leakage as critical vulnerabilities. The authors propose a three-step attack methodology and validate it through large-scale testing on 200+ GPTs, achieving high success rates. They also evaluate a defense mechanism via red-teaming and reveal that current defenses can be bypassed, even with code interpreters present. The work emphasizes the need for robust, multi-faceted security measures and responsible disclosure to stakeholders to protect privacy and IP while preserving customization benefits.

Abstract

In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature - customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections. Through prompt injection, an adversary can not only extract the customized system prompts but also access the uploaded files. This paper provides a first-hand analysis of the prompt injection, alongside the evaluation of the possible mitigation of such attacks. Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models. The intent of this paper is to raise awareness and prompt action in the AI community, ensuring that the benefits of GPT customization do not come at the cost of compromised security and privacy.
Paper Structure (14 sections, 2 figures, 5 tables)

This paper contains 14 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Proposed prompt injection method to extract system prompts and files from custom GPTs.
  • Figure 2: Privacy issues with OpenAI interfaces. In the left figure, we could exploit the information of filenames. In the right figure, we could know how the user designed the plugin prototype for the custom GPT.