Comprehensive Evaluation of ChatGPT Reliability Through Multilingual Inquiries

Poorna Chander Reddy Puttaparthi; Soham Sanjay Deo; Hakan Gul; Yiming Tang; Weiyi Shang; Zhe Yu

Comprehensive Evaluation of ChatGPT Reliability Through Multilingual Inquiries

Poorna Chander Reddy Puttaparthi, Soham Sanjay Deo, Hakan Gul, Yiming Tang, Weiyi Shang, Zhe Yu

TL;DR

<3-5 sentence high-level summary>This study tackles the risk of jailbreak vulnerabilities in ChatGPT across多 languages using a fuzzing-based framework with three core prompt-wrapping strategies and a prompt-injection variant, analyzing 7,892 Q&A prompts spanning 121 languages. It demonstrates that multilingual wrapping can trigger jailbreak and that prompt injection substantially amplifies this risk, with language and content playing significant roles in outcomes. The work provides concrete insights for improving language-diversity safety checks and highlights potential biases introduced by English-dominant training data, while offering open-source data and code to enable replication and broader evaluation.

Abstract

ChatGPT is currently the most popular large language model (LLM), with over 100 million users, making a significant impact on people's lives. However, due to the presence of jailbreak vulnerabilities, ChatGPT might have negative effects on people's lives, potentially even facilitating criminal activities. Testing whether ChatGPT can cause jailbreak is crucial because it can enhance ChatGPT's security, reliability, and social responsibility. Inspired by previous research revealing the varied performance of LLMs in different language translations, we suspected that wrapping prompts in multiple languages might lead to ChatGPT jailbreak. To investigate this, we designed a study with a fuzzing testing approach to analyzing ChatGPT's cross-linguistic proficiency. Our study includes three strategies by automatically posing different formats of malicious questions to ChatGPT: (1) each malicious question involving only one language, (2) multilingual malicious questions, (3) specifying that ChatGPT responds in a language different from the prompts. In addition, we also combine our strategies by utilizing prompt injection templates to wrap the three aforementioned types of questions. We examined a total of 7,892 Q&A data points, discovering that multilingual wrapping can indeed lead to ChatGPT's jailbreak, with different wrapping methods having varying effects on jailbreak probability. Prompt injection can amplify the probability of jailbreak caused by multilingual wrapping. This work provides insights for OpenAI developers to enhance ChatGPT's support for language diversity and inclusion.

Comprehensive Evaluation of ChatGPT Reliability Through Multilingual Inquiries

TL;DR

Abstract

Paper Structure (23 sections, 13 figures, 4 tables)

This paper contains 23 sections, 13 figures, 4 tables.

introduction
Motivating Example
Study Design
Research Questions
Data Collection
Question Collection
Language Selection
Jailbreak Identification
Experimental Results
Questions in Different Languages Separately (RQ1).
Questions Written in Multiple Languages (RQ2).
Respond in a Language Different from the Question's Language (RQ3).
Prompt Injection with Multilingual Wrapping (RQ4)
Discussion
How does our study help OpenAI reduce the probability of ChatGPT's jailbreak?
...and 8 more sections

Figures (13)

Figure 1: An example of a user asking a malicious question to ChatGPT in different languages and ChatGPT's responses.
Figure 2: An overview of our study.
Figure 3: Answers' label distribution when ChatGPT responds to different languages (RQ1).
Figure 4: An example of a FALSE answer assuming the difficulties the asker is facing and offering suggestions.
Figure 5: An example of a TRUE answer not directly addressing the question but containing malicious information.
...and 8 more figures

Comprehensive Evaluation of ChatGPT Reliability Through Multilingual Inquiries

TL;DR

Abstract

Comprehensive Evaluation of ChatGPT Reliability Through Multilingual Inquiries

Authors

TL;DR

Abstract

Table of Contents

Figures (13)