Table of Contents
Fetching ...

Emerging Cyber Attack Risks of Medical AI Agents

Jianing Qiu, Lin Li, Jiankai Sun, Hao Wei, Zhe Xu, Kyle Lam, Wu Yuan

TL;DR

This work examines the cyberattack vulnerability of medical AI agents that access the web, focusing on adversarial prompts embedded in webpages. By evaluating multiple LLM backbones (e.g., OpenAI o1/o1-mini, DeepSeek-R1/V3, GPT-4o, Llama 3.2) with web browsing and email tools in a sandbox, the study quantifies Attack Success Rates across four attack types: injecting false information, manipulating recommendations, stealing private conversations, and hijacking computer systems. Results show substantial vulnerability, with DeepSeek-R1 exhibiting the highest susceptibility (e.g., ASR up to $0.90$ for false information and $0.66$ for hijacking), and even leading models capable of leaking patient data or elevating malicious content in recommendations. The paper discusses safeguards—such as content filtering, verification steps, and user-controlled prompts—and highlights limitations like the single-agent setup and limited tool access, underscoring the need for robust defenses as medical AI agents become more capable and autonomous.

Abstract

Large language models (LLMs)-powered AI agents exhibit a high level of autonomy in addressing medical and healthcare challenges. With the ability to access various tools, they can operate within an open-ended action space. However, with the increase in autonomy and ability, unforeseen risks also arise. In this work, we investigated one particular risk, i.e., cyber attack vulnerability of medical AI agents, as agents have access to the Internet through web browsing tools. We revealed that through adversarial prompts embedded on webpages, cyberattackers can: i) inject false information into the agent's response; ii) they can force the agent to manipulate recommendation (e.g., healthcare products and services); iii) the attacker can also steal historical conversations between the user and agent, resulting in the leak of sensitive/private medical information; iv) furthermore, the targeted agent can also cause a computer system hijack by returning a malicious URL in its response. Different backbone LLMs were examined, and we found such cyber attacks can succeed in agents powered by most mainstream LLMs, with the reasoning models such as DeepSeek-R1 being the most vulnerable.

Emerging Cyber Attack Risks of Medical AI Agents

TL;DR

This work examines the cyberattack vulnerability of medical AI agents that access the web, focusing on adversarial prompts embedded in webpages. By evaluating multiple LLM backbones (e.g., OpenAI o1/o1-mini, DeepSeek-R1/V3, GPT-4o, Llama 3.2) with web browsing and email tools in a sandbox, the study quantifies Attack Success Rates across four attack types: injecting false information, manipulating recommendations, stealing private conversations, and hijacking computer systems. Results show substantial vulnerability, with DeepSeek-R1 exhibiting the highest susceptibility (e.g., ASR up to for false information and for hijacking), and even leading models capable of leaking patient data or elevating malicious content in recommendations. The paper discusses safeguards—such as content filtering, verification steps, and user-controlled prompts—and highlights limitations like the single-agent setup and limited tool access, underscoring the need for robust defenses as medical AI agents become more capable and autonomous.

Abstract

Large language models (LLMs)-powered AI agents exhibit a high level of autonomy in addressing medical and healthcare challenges. With the ability to access various tools, they can operate within an open-ended action space. However, with the increase in autonomy and ability, unforeseen risks also arise. In this work, we investigated one particular risk, i.e., cyber attack vulnerability of medical AI agents, as agents have access to the Internet through web browsing tools. We revealed that through adversarial prompts embedded on webpages, cyberattackers can: i) inject false information into the agent's response; ii) they can force the agent to manipulate recommendation (e.g., healthcare products and services); iii) the attacker can also steal historical conversations between the user and agent, resulting in the leak of sensitive/private medical information; iv) furthermore, the targeted agent can also cause a computer system hijack by returning a malicious URL in its response. Different backbone LLMs were examined, and we found such cyber attacks can succeed in agents powered by most mainstream LLMs, with the reasoning models such as DeepSeek-R1 being the most vulnerable.

Paper Structure

This paper contains 14 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Illustration of cyberattacks on medical AI agents.
  • Figure 2: Distribution of search queries in set #1, which contains queries that both the general public and healthcare professionals might search online for information.
  • Figure 3: Distribution of clinical search queries in set #2, which contains queries that clinicians might search online for information.
  • Figure 4: Success rate per search category in injecting false information attacks.
  • Figure 5: Success rate per search category in manipulating recommendation attacks.
  • ...and 1 more figures