Death by a Thousand Prompts: Open Model Vulnerability Analysis

Amy Chang; Nicholas Conley; Harish Santhanalakshmi Ganesan; Adam Swanda

Death by a Thousand Prompts: Open Model Vulnerability Analysis

Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, Adam Swanda

TL;DR

Open-weight LLMs display pervasive vulnerabilities to adversarial prompts, especially in multi-turn interactions, risking data leakage and manipulation. The study conducts automated single-turn and multi-turn jailbreak testing across eight models with a GPT-3.5 Turbo attacker/scorer, uncovering multi-turn attack ASR up to 92.78% and significant safety gaps driven by alignment strategies. The work maps model-specific vulnerability profiles and provides concrete best practices for defense, including layered guardrails and continuous red-teaming. The findings stress the need for security-focused design and monitoring to enable safe deployment of open-weight LLMs in enterprise and public settings.

Abstract

Open-weight models provide researchers and developers with accessible foundations for diverse downstream applications. We tested the safety and security postures of eight open-weight large language models (LLMs) to identify vulnerabilities that may impact subsequent fine-tuning and deployment. Using automated adversarial testing, we measured each model's resilience against single-turn and multi-turn prompt injection and jailbreak attacks. Our findings reveal pervasive vulnerabilities across all tested models, with multi-turn attacks achieving success rates between 25.86\% and 92.78\% -- representing a $2\times$ to $10\times$ increase over single-turn baselines. These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions. We assess that alignment strategies and lab priorities significantly influence resilience: capability-focused models such as Llama 3.3 and Qwen 3 demonstrate higher multi-turn susceptibility, whereas safety-oriented designs such as Google Gemma 3 exhibit more balanced performance. The analysis concludes that open-weight models, while crucial for innovation, pose tangible operational and ethical risks when deployed without layered security controls. These findings are intended to inform practitioners and developers of the potential risks and the value of professional AI security solutions to mitigate exposure. Addressing multi-turn vulnerabilities is essential to ensure the safe, reliable, and responsible deployment of open-weight LLMs in enterprise and public domains. We recommend adopting a security-first design philosophy and layered protections to ensure resilient deployments of open-weight models.

Death by a Thousand Prompts: Open Model Vulnerability Analysis

TL;DR

Abstract

increase over single-turn baselines. These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions. We assess that alignment strategies and lab priorities significantly influence resilience: capability-focused models such as Llama 3.3 and Qwen 3 demonstrate higher multi-turn susceptibility, whereas safety-oriented designs such as Google Gemma 3 exhibit more balanced performance. The analysis concludes that open-weight models, while crucial for innovation, pose tangible operational and ethical risks when deployed without layered security controls. These findings are intended to inform practitioners and developers of the potential risks and the value of professional AI security solutions to mitigate exposure. Addressing multi-turn vulnerabilities is essential to ensure the safe, reliable, and responsible deployment of open-weight LLMs in enterprise and public domains. We recommend adopting a security-first design philosophy and layered protections to ensure resilient deployments of open-weight models.

Death by a Thousand Prompts: Open Model Vulnerability Analysis

TL;DR

Abstract

Death by a Thousand Prompts: Open Model Vulnerability Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)