Table of Contents
Fetching ...

CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models

Johan Wahréus, Ahmed Mohamed Hussain, Panos Papadimitratos

TL;DR

CySecBench introduces a domain-specific dataset of $12662$ prompts across $10$ cybersecurity attack categories to enable precise evaluation of LLM jailbreaking. It details a two-phase prompt generation and filtering pipeline and proposes a prompt-obfuscation jailbreak, evaluated across ChatGPT, Gemini, and Claude, with $SR$ up to $88.4\%$ and $AR$ up to $4.77$ on CySecBench, and improved performance via refinements to $SR=78.5\%$ and $AR=4.23$ on AdvBench. The work demonstrates significant model-dependent differences in resilience, argues for domain-specific evaluation for robust safety measures, and outlines future directions for cross-domain datasets, automated maintenance, and multi-modal security assessment. Overall, CySecBench provides a scalable, domain-tailored framework for benchmarking and improving LLM safety against targeted cyberattack prompts, with practical implications for both dataset design and model defense.

Abstract

Numerous studies have investigated methods for jailbreaking Large Language Models (LLMs) to generate harmful content. Typically, these methods are evaluated using datasets of malicious prompts designed to bypass security policies established by LLM providers. However, the generally broad scope and open-ended nature of existing datasets can complicate the assessment of jailbreaking effectiveness, particularly in specific domains, notably cybersecurity. To address this issue, we present and publicly release CySecBench, a comprehensive dataset containing 12662 prompts specifically designed to evaluate jailbreaking techniques in the cybersecurity domain. The dataset is organized into 10 distinct attack-type categories, featuring close-ended prompts to enable a more consistent and accurate assessment of jailbreaking attempts. Furthermore, we detail our methodology for dataset generation and filtration, which can be adapted to create similar datasets in other domains. To demonstrate the utility of CySecBench, we propose and evaluate a jailbreaking approach based on prompt obfuscation. Our experimental results show that this method successfully elicits harmful content from commercial black-box LLMs, achieving Success Rates (SRs) of 65% with ChatGPT and 88% with Gemini; in contrast, Claude demonstrated greater resilience with a jailbreaking SR of 17%. Compared to existing benchmark approaches, our method shows superior performance, highlighting the value of domain-specific evaluation datasets for assessing LLM security measures. Moreover, when evaluated using prompts from a widely used dataset (i.e., AdvBench), it achieved an SR of 78.5%, higher than the state-of-the-art methods.

CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models

TL;DR

CySecBench introduces a domain-specific dataset of prompts across cybersecurity attack categories to enable precise evaluation of LLM jailbreaking. It details a two-phase prompt generation and filtering pipeline and proposes a prompt-obfuscation jailbreak, evaluated across ChatGPT, Gemini, and Claude, with up to and up to on CySecBench, and improved performance via refinements to and on AdvBench. The work demonstrates significant model-dependent differences in resilience, argues for domain-specific evaluation for robust safety measures, and outlines future directions for cross-domain datasets, automated maintenance, and multi-modal security assessment. Overall, CySecBench provides a scalable, domain-tailored framework for benchmarking and improving LLM safety against targeted cyberattack prompts, with practical implications for both dataset design and model defense.

Abstract

Numerous studies have investigated methods for jailbreaking Large Language Models (LLMs) to generate harmful content. Typically, these methods are evaluated using datasets of malicious prompts designed to bypass security policies established by LLM providers. However, the generally broad scope and open-ended nature of existing datasets can complicate the assessment of jailbreaking effectiveness, particularly in specific domains, notably cybersecurity. To address this issue, we present and publicly release CySecBench, a comprehensive dataset containing 12662 prompts specifically designed to evaluate jailbreaking techniques in the cybersecurity domain. The dataset is organized into 10 distinct attack-type categories, featuring close-ended prompts to enable a more consistent and accurate assessment of jailbreaking attempts. Furthermore, we detail our methodology for dataset generation and filtration, which can be adapted to create similar datasets in other domains. To demonstrate the utility of CySecBench, we propose and evaluate a jailbreaking approach based on prompt obfuscation. Our experimental results show that this method successfully elicits harmful content from commercial black-box LLMs, achieving Success Rates (SRs) of 65% with ChatGPT and 88% with Gemini; in contrast, Claude demonstrated greater resilience with a jailbreaking SR of 17%. Compared to existing benchmark approaches, our method shows superior performance, highlighting the value of domain-specific evaluation datasets for assessing LLM security measures. Moreover, when evaluated using prompts from a widely used dataset (i.e., AdvBench), it achieved an SR of 78.5%, higher than the state-of-the-art methods.
Paper Structure (13 sections, 2 equations, 15 figures, 6 tables, 3 algorithms)

This paper contains 13 sections, 2 equations, 15 figures, 6 tables, 3 algorithms.

Figures (15)

  • Figure 1: LLM instructions used in the second phase of the filtering process listed in Algorithm \ref{['alg:filtering']}.
  • Figure 2: Instructions provided to LLM to generate exam questions.
  • Figure 3: Instructions provided to LLM to generate exam solutions.
  • Figure 4: The proposed jailbreaking architecture.
  • Figure 5: Instructions provided to GPT judge.
  • ...and 10 more figures