Table of Contents
Fetching ...

Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis

Jonathan Brokman, Omer Hofman, Oren Rachmil, Inderjeet Singh, Vikas Pahuja, Rathina Sabapathy Aishvariya Priya, Amit Giloni, Roman Vainshtein, Hisashi Kojima

TL;DR

A comparative analysis of open-source tools that scan conversational large language models (LLMs) for vulnerabilities, in short - scanners, finds significant reliability issues in detecting successful attacks, highlighting a fundamental gap for future development.

Abstract

This report presents a comparative analysis of open-source vulnerability scanners for conversational large language models (LLMs). As LLMs become integral to various applications, they also present potential attack surfaces, exposed to security risks such as information leakage and jailbreak attacks. Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features and practical use of these scanners, outline unifying principles of their design and perform quantitative evaluations to compare them. These evaluations uncover significant reliability issues in detecting successful attacks, highlighting a fundamental gap for future development. Additionally, we contribute a preliminary labelled dataset, which serves as an initial step to bridge this gap. Based on the above, we provide strategic recommendations to assist organizations choose the most suitable scanner for their red-teaming needs, accounting for customizability, test suite comprehensiveness, and industry-specific use cases.

Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis

TL;DR

A comparative analysis of open-source tools that scan conversational large language models (LLMs) for vulnerabilities, in short - scanners, finds significant reliability issues in detecting successful attacks, highlighting a fundamental gap for future development.

Abstract

This report presents a comparative analysis of open-source vulnerability scanners for conversational large language models (LLMs). As LLMs become integral to various applications, they also present potential attack surfaces, exposed to security risks such as information leakage and jailbreak attacks. Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features and practical use of these scanners, outline unifying principles of their design and perform quantitative evaluations to compare them. These evaluations uncover significant reliability issues in detecting successful attacks, highlighting a fundamental gap for future development. Additionally, we contribute a preliminary labelled dataset, which serves as an initial step to bridge this gap. Based on the above, we provide strategic recommendations to assist organizations choose the most suitable scanner for their red-teaming needs, accounting for customizability, test suite comprehensiveness, and industry-specific use cases.

Paper Structure

This paper contains 18 sections, 11 figures, 7 tables.

Figures (11)

  • Figure 1: High-level Overview of our Quantitative Results.a) Scanner performance scatter plot. y-axis: Reported attack effectiveness; x-axis: Average reliability based on correct evaluation of attacks' success. Circle radius: No. of adversarial prompts in the test-suite. See further detail on how these axes are calculated in Sec. \ref{['sec:evaluation']}b) Adversarial prompts distribution. Prompt's attack types are grouped into five categories for comparability.
  • Figure 2: General design of the automated LLM red-teaming flow, used by scanners.
  • Figure 3: Top: Example Flow of Giskard, testing a "Workshop Organizer AI". An important aspect of Giskard is its ability to customize tests for LLMs that are designed for specific tasks. This example demonstrates Giskard's customization via its distinctive requirements-based test. This is a shortened version - for the full version, including the evaluation phase, refer to Fig. 4 in the supplementary material. Bottom: Example flow of PyRIT's multi-step attack for generating Python Key Logger. An attacker LLM is tasked with attacking a target LLM under evaluation, while another LLM assesses the attack's success. This loop continues until the attack succeeds or a stopping criterion is met.
  • Figure 4: Per-attack performance, averaged over the target LLMs. MOE is shown as error bars.
  • Figure 5: Example of Giskard Report.
  • ...and 6 more figures