Table of Contents
Fetching ...

Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models

Junjie Xiong, Changjia Zhu, Shuhang Lin, Chong Zhang, Yongfeng Zhang, Yao Liu, Lingyao Li

TL;DR

The paper tackles security vulnerabilities arising when LLMs with real-time web access and Model Context Protocol (MCP) interact with external resources. It introduces malicious-font injection as a novel indirect prompt attack, evaluating two scenarios: Malicious Content Relay and Sensitive Data Leakage, across multiple LLMs and formats. Results show PDF documents are more vulnerable than HTML, higher injection frequency and early document placement boost success, and indirect prompts can bypass safety filters, especially for low/medium sensitivity data, though high-sensitivity data often trigger refusals or sanitized responses. The work highlights a pressing need for defenses that verify both semantic content and visual integrity when LLMs process externally supplied material, particularly in MCP-enabled workflows.

Abstract

Large Language Models (LLMs) are increasingly equipped with capabilities of real-time web search and integrated with protocols like Model Context Protocol (MCP). This extension could introduce new security vulnerabilities. We present a systematic investigation of LLM vulnerabilities to hidden adversarial prompts through malicious font injection in external resources like webpages, where attackers manipulate code-to-glyph mapping to inject deceptive content which are invisible to users. We evaluate two critical attack scenarios: (1) "malicious content relay" and (2) "sensitive data leakage" through MCP-enabled tools. Our experiments reveal that indirect prompts with injected malicious font can bypass LLM safety mechanisms through external resources, achieving varying success rates based on data sensitivity and prompt design. Our research underscores the urgent need for enhanced security measures in LLM deployments when processing external content.

Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models

TL;DR

The paper tackles security vulnerabilities arising when LLMs with real-time web access and Model Context Protocol (MCP) interact with external resources. It introduces malicious-font injection as a novel indirect prompt attack, evaluating two scenarios: Malicious Content Relay and Sensitive Data Leakage, across multiple LLMs and formats. Results show PDF documents are more vulnerable than HTML, higher injection frequency and early document placement boost success, and indirect prompts can bypass safety filters, especially for low/medium sensitivity data, though high-sensitivity data often trigger refusals or sanitized responses. The work highlights a pressing need for defenses that verify both semantic content and visual integrity when LLMs process externally supplied material, particularly in MCP-enabled workflows.

Abstract

Large Language Models (LLMs) are increasingly equipped with capabilities of real-time web search and integrated with protocols like Model Context Protocol (MCP). This extension could introduce new security vulnerabilities. We present a systematic investigation of LLM vulnerabilities to hidden adversarial prompts through malicious font injection in external resources like webpages, where attackers manipulate code-to-glyph mapping to inject deceptive content which are invisible to users. We evaluate two critical attack scenarios: (1) "malicious content relay" and (2) "sensitive data leakage" through MCP-enabled tools. Our experiments reveal that indirect prompts with injected malicious font can bypass LLM safety mechanisms through external resources, achieving varying success rates based on data sensitivity and prompt design. Our research underscores the urgent need for enhanced security measures in LLM deployments when processing external content.

Paper Structure

This paper contains 25 sections, 1 equation, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Illustration of malicious font creation through code segment modification. The figure demonstrates how an attacker manipulates the character mapping table by modifying code segments and their corresponding glyph indices. Each segment (e.g., segments 1-4) contains a range of character codes (Startcode to Endcode) and can be strategically modified by adjusting the idDelta values to create deceptive character mappings.
  • Figure 2: Overview of the experimental framework investigating two critical attack scenarios: (1) Malicious Content Relay, where LLMs process and relay hidden adversarial prompts from external sources to users, and (2) Sensitive Data Leakage, where attackers exploit MCP-enabled tools to exfiltrate user information through hidden prompts.
  • Figure 3: Threat model of sensitive data Leakage
  • Figure 4: Success rates (%) with different injection frequencies (with moving averages of window sizes 2-5).
  • Figure 5: Success rates (%) of adversarial attacks in PDF and HTML documents.
  • ...and 5 more figures