Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs
Andrew Maranhão Ventura D'addario
TL;DR
The paper argues that generic harm definitions are insufficient for healthcare LLM safety and proposes context-aware safety anchored in the SUS ecosystem. It introduces Medical Malice, a 214,219-prompt adversarial dataset generated via an unaligned agent (Grok-4) and a persona-driven pipeline, covering seven harm categories and including rationale for each prompt to teach ethical boundaries. The dataset provides vulnerability signatures (prompts and rationale) without completions to enable red-teaming and immunization of LLMs against nuanced, system-specific threats. The work advocates shifting from universal to context-aware safety and discusses ethical considerations and applicability to other health systems.
Abstract
The integration of Large Language Models (LLMs) into healthcare demands a safety paradigm rooted in \textit{primum non nocere}. However, current alignment techniques rely on generic definitions of harm that fail to capture context-dependent violations, such as administrative fraud and clinical discrimination. To address this, we introduce Medical Malice: a dataset of 214,219 adversarial prompts calibrated to the regulatory and ethical complexities of the Brazilian Unified Health System (SUS). Crucially, the dataset includes the reasoning behind each violation, enabling models to internalize ethical boundaries rather than merely memorizing a fixed set of refusals. Using an unaligned agent (Grok-4) within a persona-driven pipeline, we synthesized high-fidelity threats across seven taxonomies, ranging from procurement manipulation and queue-jumping to obstetric violence. We discuss the ethical design of releasing these "vulnerability signatures" to correct the information asymmetry between malicious actors and AI developers. Ultimately, this work advocates for a shift from universal to context-aware safety, providing the necessary resources to immunize healthcare AI against the nuanced, systemic threats inherent to high-stakes medical environments -- vulnerabilities that represent the paramount risk to patient safety and the successful integration of AI in healthcare systems.
