Table of Contents
Fetching ...

SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis

Aidan Wong, He Cao, Zijing Liu, Yu Li

TL;DR

This paper explores the security vulnerabilities of LLMs within the field of chemistry, particularly their capacity to provide instructions for synthesizing hazardous substances, and introduces a novel attack technique named SMILES-prompting, which uses the Simplified Molecular-Input Line-Entry System to reference chemical substances.

Abstract

The increasing integration of large language models (LLMs) across various fields has heightened concerns about their potential to propagate dangerous information. This paper specifically explores the security vulnerabilities of LLMs within the field of chemistry, particularly their capacity to provide instructions for synthesizing hazardous substances. We evaluate the effectiveness of several prompt injection attack methods, including red-teaming, explicit prompting, and implicit prompting. Additionally, we introduce a novel attack technique named SMILES-prompting, which uses the Simplified Molecular-Input Line-Entry System (SMILES) to reference chemical substances. Our findings reveal that SMILES-prompting can effectively bypass current safety mechanisms. These findings highlight the urgent need for enhanced domain-specific safeguards in LLMs to prevent misuse and improve their potential for positive social impact.

SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis

TL;DR

This paper explores the security vulnerabilities of LLMs within the field of chemistry, particularly their capacity to provide instructions for synthesizing hazardous substances, and introduces a novel attack technique named SMILES-prompting, which uses the Simplified Molecular-Input Line-Entry System to reference chemical substances.

Abstract

The increasing integration of large language models (LLMs) across various fields has heightened concerns about their potential to propagate dangerous information. This paper specifically explores the security vulnerabilities of LLMs within the field of chemistry, particularly their capacity to provide instructions for synthesizing hazardous substances. We evaluate the effectiveness of several prompt injection attack methods, including red-teaming, explicit prompting, and implicit prompting. Additionally, we introduce a novel attack technique named SMILES-prompting, which uses the Simplified Molecular-Input Line-Entry System (SMILES) to reference chemical substances. Our findings reveal that SMILES-prompting can effectively bypass current safety mechanisms. These findings highlight the urgent need for enhanced domain-specific safeguards in LLMs to prevent misuse and improve their potential for positive social impact.

Paper Structure

This paper contains 11 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Pipeline Overview: We tested four types of jailbreak attacks on 30 prohibited substances using leading LLMs. The results were assessed by a GPT-4o classifier on two criterion, and then manually verified.
  • Figure 2: Success Rates of LLMs under Different Jailbreak Methods. Attack Success Rates (ASR) of 4 attack types on 2 LLMs are shown across four attack methods: Implicit, SMILES, Red-Team, and Explicit prompting methods. The left figure presents success rates for component identification, while the right shows rates for process identification.