Table of Contents
Fetching ...

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

Jinghao Wang, Ping Zhang, Carter Yagemann

TL;DR

The paper identifies critical security risks in medical AI, notably jailbreaking and privacy leakage, and highlights barriers to research due to GPU, API, and PHI data requirements. It proposes a practical, fully reproducible framework that operates on consumer hardware using synthetic data, spanning multiple clinical specialties and organized by risk level. The protocol specifies target models, hardware configurations, generation parameters, and standardized metrics (ASR, privacy leakage, and specialty stratification) with rigorous statistical analysis. By enabling zero-cost, cross-domain security assessment and defense testing, the framework aims to accelerate community-driven safety research and guide safer deployment of medical AI systems, with extensions to multimodal and commercial systems in future work.

Abstract

Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data -- barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk -- from high-risk domains such as emergency medicine and psychiatry to general practice -- addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic patient records requiring no IRB approval. The framework is designed to run entirely on consumer CPU hardware using freely available models, eliminating cost barriers. We present the framework specification including threat models, data generation methodology, evaluation protocols, and scoring rubrics. This proposal establishes a foundation for comparative security assessment of medical-specialist models and defense mechanisms, advancing the broader goal of ensuring safe and trustworthy medical AI systems.

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

TL;DR

The paper identifies critical security risks in medical AI, notably jailbreaking and privacy leakage, and highlights barriers to research due to GPU, API, and PHI data requirements. It proposes a practical, fully reproducible framework that operates on consumer hardware using synthetic data, spanning multiple clinical specialties and organized by risk level. The protocol specifies target models, hardware configurations, generation parameters, and standardized metrics (ASR, privacy leakage, and specialty stratification) with rigorous statistical analysis. By enabling zero-cost, cross-domain security assessment and defense testing, the framework aims to accelerate community-driven safety research and guide safer deployment of medical AI systems, with extensions to multimodal and commercial systems in future work.

Abstract

Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data -- barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk -- from high-risk domains such as emergency medicine and psychiatry to general practice -- addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic patient records requiring no IRB approval. The framework is designed to run entirely on consumer CPU hardware using freely available models, eliminating cost barriers. We present the framework specification including threat models, data generation methodology, evaluation protocols, and scoring rubrics. This proposal establishes a foundation for comparative security assessment of medical-specialist models and defense mechanisms, advancing the broader goal of ensuring safe and trustworthy medical AI systems.

Paper Structure

This paper contains 48 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Overview of the proposed medical AI security evaluation framework. The pipeline progresses from left to right: (1) clinical specialty selection based on risk level, (2) synthetic patient data generation with PHI placeholders, (3) attack template application across jailbreaking and privacy extraction categories, (4) model evaluation using freely available LLMs, (5) response scoring using standardized rubrics, and (6) metric computation including Attack Success Rate.