Table of Contents
Fetching ...

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh

TL;DR

ArtPerception addresses safety vulnerabilities in LLMs arising from non-semantic, multimodal interpretations such as ASCII art embedded in text. It proposes a two-phase black-box jailbreak framework: Phase 1 a benign pre-test builds a model-specific recognition profile across fonts/orientations/prompts, Phase 2 a highly efficient one-shot attack using that profile. A Modified Levenshtein Distance metric ($MLD$) complements accuracy ($Acc$) to quantify partial ASCII-art recognition. Empirical evaluation on four open-source LLMs and transfer experiments to commercial models demonstrate strong recognition-to-jailbreak correlation and notable efficiency gains, while defenses mitigate but do not fully block the attacks.

Abstract

The integration of Large Language Models (LLMs) into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data representations. This paper introduces ArtPerception, a novel black-box jailbreak framework that strategically leverages ASCII art to bypass the security measures of state-of-the-art (SOTA) LLMs. Unlike prior methods that rely on iterative, brute-force attacks, ArtPerception introduces a systematic, two-phase methodology. Phase 1 conducts a one-time, model-specific pre-test to empirically determine the optimal parameters for ASCII art recognition. Phase 2 leverages these insights to launch a highly efficient, one-shot malicious jailbreak attack. We propose a Modified Levenshtein Distance (MLD) metric for a more nuanced evaluation of an LLM's recognition capability. Through comprehensive experiments on four SOTA open-source LLMs, we demonstrate superior jailbreak performance. We further validate our framework's real-world relevance by showing its successful transferability to leading commercial models, including GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3, and by conducting a rigorous effectiveness analysis against potential defenses such as LLaMA Guard and Azure's content filters. Our findings underscore that true LLM security requires defending against a multi-modal space of interpretations, even within text-only inputs, and highlight the effectiveness of strategic, reconnaissance-based attacks. Content Warning: This paper includes potentially harmful and offensive model outputs.

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

TL;DR

ArtPerception addresses safety vulnerabilities in LLMs arising from non-semantic, multimodal interpretations such as ASCII art embedded in text. It proposes a two-phase black-box jailbreak framework: Phase 1 a benign pre-test builds a model-specific recognition profile across fonts/orientations/prompts, Phase 2 a highly efficient one-shot attack using that profile. A Modified Levenshtein Distance metric () complements accuracy () to quantify partial ASCII-art recognition. Empirical evaluation on four open-source LLMs and transfer experiments to commercial models demonstrate strong recognition-to-jailbreak correlation and notable efficiency gains, while defenses mitigate but do not fully block the attacks.

Abstract

The integration of Large Language Models (LLMs) into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data representations. This paper introduces ArtPerception, a novel black-box jailbreak framework that strategically leverages ASCII art to bypass the security measures of state-of-the-art (SOTA) LLMs. Unlike prior methods that rely on iterative, brute-force attacks, ArtPerception introduces a systematic, two-phase methodology. Phase 1 conducts a one-time, model-specific pre-test to empirically determine the optimal parameters for ASCII art recognition. Phase 2 leverages these insights to launch a highly efficient, one-shot malicious jailbreak attack. We propose a Modified Levenshtein Distance (MLD) metric for a more nuanced evaluation of an LLM's recognition capability. Through comprehensive experiments on four SOTA open-source LLMs, we demonstrate superior jailbreak performance. We further validate our framework's real-world relevance by showing its successful transferability to leading commercial models, including GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3, and by conducting a rigorous effectiveness analysis against potential defenses such as LLaMA Guard and Azure's content filters. Our findings underscore that true LLM security requires defending against a multi-modal space of interpretations, even within text-only inputs, and highlight the effectiveness of strategic, reconnaissance-based attacks. Content Warning: This paper includes potentially harmful and offensive model outputs.

Paper Structure

This paper contains 38 sections, 1 equation, 20 figures, 9 tables.

Figures (20)

  • Figure 1: Example of a direct instruction versus an ArtPerception attack. Attacker A's direct harmful instruction is rejected. Attacker B uses ArtPerception to encode keywords "ANOREXIA" (MASK1) and "BULIMIA" (MASK2) as ASCII art, successfully bypassing safety measures.
  • Figure 2: The ArtPerception Framework: Phase 1 (Pre-test) empirically determines the best ASCII art recognition techniques (font, orientation, hint strategy, CoT, ICL) for a target LLM via benign queries. Phase 2 (Attack) leverages these findings to construct and execute a targeted, efficient one-shot jailbreak attack against the target LLM.
  • Figure 3: Recognition Pre-test Heatmap for LLaMA-3-8B showing Acc and MLD across various fonts and techniques. The red boxes highlight the font ('keyboard') and technique ('Tail-Horizontal') that yielded optimal results, forming part of its Top-1 Technique Set.
  • Figure 4: Recognition Pre-test Heatmap for Gemma-2-9B. The red boxes highlight the 'cards' font and the 'Head-Vertical' technique as optimal.
  • Figure 5: Recognition Pre-test Heatmap for Mistral-7B-v0.3. The red boxes highlight 'keyboard' font and 'Head-Horizontal' / 'Mid-Horizontal' techniques.
  • ...and 15 more figures