Table of Contents
Fetching ...

RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search

Quy-Anh Dang, Chris Ngo, Truong-Son Hy

TL;DR

RainbowPlus reframes adversarial prompt generation for large language models as an adaptive evolutionary quality-diversity search, addressing scalability and diversity limitations of prior red-teaming methods. It introduces a multi-element archive that stores multiple high-quality prompts per behavioral niche and replaces pairwise scoring with a probabilistic fitness evaluation, enabling batch assessment of prompts and a linear-time search complexity. Across six benchmarks and twelve LLMs, RainbowPlus achieves higher attack success rates and maintains a Diverse-Score around 0.84, generating orders of magnitude more prompts than prior approaches while often reducing runtime (e.g., ~1.45 hours on HarmBench vs ~13.5 hours for AutoDAN-Turbo). The open-source implementation supports reproducibility and further research in LLM safety, highlighting RainbowPlus as a scalable tool for comprehensive vulnerability assessment and red-teaming workflow optimization.

Abstract

Large Language Models (LLMs) exhibit remarkable capabilities but are susceptible to adversarial prompts that exploit vulnerabilities to produce unsafe or biased outputs. Existing red-teaming methods often face scalability challenges, resource-intensive requirements, or limited diversity in attack strategies. We propose RainbowPlus, a novel red-teaming framework rooted in evolutionary computation, enhancing adversarial prompt generation through an adaptive quality-diversity (QD) search that extends classical evolutionary algorithms like MAP-Elites with innovations tailored for language models. By employing a multi-element archive to store diverse high-quality prompts and a comprehensive fitness function to evaluate multiple prompts concurrently, RainbowPlus overcomes the constraints of single-prompt archives and pairwise comparisons in prior QD methods like Rainbow Teaming. Experiments comparing RainbowPlus to QD methods across six benchmark datasets and four open-source LLMs demonstrate superior attack success rate (ASR) and diversity (Diverse-Score $\approx 0.84$), generating up to 100 times more unique prompts (e.g., 10,418 vs. 100 for Ministral-8B-Instruct-2410). Against nine state-of-the-art methods on the HarmBench dataset with twelve LLMs (ten open-source, two closed-source), RainbowPlus achieves an average ASR of 81.1%, surpassing AutoDAN-Turbo by 3.9%, and is 9 times faster (1.45 vs. 13.50 hours). Our open-source implementation fosters further advancements in LLM safety, offering a scalable tool for vulnerability assessment. Code and resources are publicly available at https://github.com/knoveleng/rainbowplus, supporting reproducibility and future research in LLM red-teaming.

RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search

TL;DR

RainbowPlus reframes adversarial prompt generation for large language models as an adaptive evolutionary quality-diversity search, addressing scalability and diversity limitations of prior red-teaming methods. It introduces a multi-element archive that stores multiple high-quality prompts per behavioral niche and replaces pairwise scoring with a probabilistic fitness evaluation, enabling batch assessment of prompts and a linear-time search complexity. Across six benchmarks and twelve LLMs, RainbowPlus achieves higher attack success rates and maintains a Diverse-Score around 0.84, generating orders of magnitude more prompts than prior approaches while often reducing runtime (e.g., ~1.45 hours on HarmBench vs ~13.5 hours for AutoDAN-Turbo). The open-source implementation supports reproducibility and further research in LLM safety, highlighting RainbowPlus as a scalable tool for comprehensive vulnerability assessment and red-teaming workflow optimization.

Abstract

Large Language Models (LLMs) exhibit remarkable capabilities but are susceptible to adversarial prompts that exploit vulnerabilities to produce unsafe or biased outputs. Existing red-teaming methods often face scalability challenges, resource-intensive requirements, or limited diversity in attack strategies. We propose RainbowPlus, a novel red-teaming framework rooted in evolutionary computation, enhancing adversarial prompt generation through an adaptive quality-diversity (QD) search that extends classical evolutionary algorithms like MAP-Elites with innovations tailored for language models. By employing a multi-element archive to store diverse high-quality prompts and a comprehensive fitness function to evaluate multiple prompts concurrently, RainbowPlus overcomes the constraints of single-prompt archives and pairwise comparisons in prior QD methods like Rainbow Teaming. Experiments comparing RainbowPlus to QD methods across six benchmark datasets and four open-source LLMs demonstrate superior attack success rate (ASR) and diversity (Diverse-Score ), generating up to 100 times more unique prompts (e.g., 10,418 vs. 100 for Ministral-8B-Instruct-2410). Against nine state-of-the-art methods on the HarmBench dataset with twelve LLMs (ten open-source, two closed-source), RainbowPlus achieves an average ASR of 81.1%, surpassing AutoDAN-Turbo by 3.9%, and is 9 times faster (1.45 vs. 13.50 hours). Our open-source implementation fosters further advancements in LLM safety, offering a scalable tool for vulnerability assessment. Code and resources are publicly available at https://github.com/knoveleng/rainbowplus, supporting reproducibility and future research in LLM red-teaming.

Paper Structure

This paper contains 76 sections, 5 theorems, 10 equations, 8 figures, 9 tables, 3 algorithms.

Key Result

Lemma D.7

For Multi-Prompt Rainbow (Definition def:multi-prompt-rainbow), when a cell $G[z]$ contains $m$ prompts $\{x_1, x_2, \ldots, x_m\}$, a candidate prompt $x'$ is added to the archive if and only if $p(x', x_i) = x'$ for all $x_i \in G[z]$. This verification requires $m$ pairwise comparisons.

Figures (8)

  • Figure 1: Overview of the RainbowPlus evolutionary pipeline. The iterative process follows a quality - diversity evolutionary search and consists of five stages: (1) Prompt Sampling selects a parent individual (adversarial prompt) and its descriptor from the archive; (2) Candidate Generation acts as a mutation operator, leveraging a Mutator LLM to produce a diverse offspring population of candidate prompts; (3) Diversity Filtering selects behaviorally distinct individuals using a diversity - promoting mechanism; (4) Response Evaluation computes fitness scores for each candidate based on a probabilistic assessment of prompt effectiveness; and (5) Update performs survivor selection by refining the archive with high - fitness, diverse prompts, analogous to niche - based population updates in evolutionary algorithms.
  • Figure 2: Temporal evolution of Attack Success Rate (ASR) for RainbowPlus (standard, $\alpha$, and $\beta$ variants) and Rainbow against Ministral-8B-Instruct-2410 on the AQA dataset over 1,000 iterations, demonstrating RainbowPlus’s faster convergence.
  • Figure 3: t-SNE visualization of RainbowPlus’s prompt distribution evolution (iteration 50 in red, 1,000 in blue against Ministral-8B-Instruct-2410 on the AQA dataset, illustrating progressive diversification.
  • Figure 4: Attack Success Rate (ASR) of standard RainbowPlus against Ministral-8B-Instruct-2410 on the AQA dataset, segmented by risk categories, showing consistent performance across diverse harm types.
  • Figure 5: Rainbow
  • ...and 3 more figures

Theorems & Definitions (15)

  • Definition D.1: Standard Rainbow Problem
  • Definition D.2: Multi-Prompt Rainbow Problem
  • Definition D.3: RainbowPlus Problem
  • Remark D.4
  • Remark D.6
  • Lemma D.7: Multi-Prompt Rainbow Update Complexity
  • proof
  • Lemma D.8: RainbowPlus Update Complexity
  • proof
  • Theorem D.9: Multi-Prompt Rainbow Time Complexity
  • ...and 5 more