Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Guoxin Shi; Haoyu Wang; Zaihui Yang; Yuxing Wang; Yongzhe Chang

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Guoxin Shi, Haoyu Wang, Zaihui Yang, Yuxing Wang, Yongzhe Chang

TL;DR

Experiments show that the Evolutionary Attacker substantially increases red-teaming jailbreak attack success rate (ASR), while the Adaptive Defender improves robustness and generalization across benchmarks with higher data efficiency, without inducing excessive benign refusal, and remains compatible with inference-time defenses such as AdaShield.

Abstract

Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on static adversarial settings, which fundamentally limit robustness, particularly in multimodal settings with a larger attack surface. In this work, we move beyond static adversarial supervision and introduce co-evolutionary alignment with evolving attacks, instantiated by CEMMA (Co-Evolutionary Multi-Modal Alignment), an automated and adaptive framework for multimodal safety alignment. We introduce an Evolutionary Attacker that decomposes adversarial prompts into method templates and harmful intents. By employing genetic operators, including mutation, crossover, and differential evolution, it enables simple seed attacks to inherit the structural efficacy of sophisticated jailbreaks. The Adaptive Defender is iteratively updated on the synthesized hard negatives, forming a closed-loop process that adapts alignment to evolving attacks. Experiments show that the Evolutionary Attacker substantially increases red-teaming jailbreak attack success rate (ASR), while the Adaptive Defender improves robustness and generalization across benchmarks with higher data efficiency, without inducing excessive benign refusal, and remains compatible with inference-time defenses such as AdaShield.

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

TL;DR

Abstract

Paper Structure (39 sections, 5 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 39 sections, 5 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Multimodal Jailbreaking
Automated Red Teaming and Safety Alignment
Evolutionary Algorithms for Prompt Optimization
Methodology
Evolutionary Attacker
Attack Evolution Framework
Genetic Operators
CEMMA: Co-Evolutionary Alignment Framework
Co-Evolutionary Adversarial Alignment Framework
Characteristics of the Framework
Experiments
Experiment 1: Evolutionary Attacker Evaluation
Setup.
...and 24 more sections

Figures (4)

Figure 1: Overview of CEMMA. CEMMA operates in a loop. The Attacker (left) evolves a population of attack prompts to increase jailbreak success. The Defender (right) is iteratively updated using successful attacks collected each round and benign data.
Figure 2: Operator ablation on SafeBench. Left: overall ASR over rounds. Right: per-seed-family ASR trends under different operator subsets.
Figure 3: Examples of the three genetic operators used in CEMMA.
Figure 4: Single-family evolution trajectories for FigStep, QR, and MML-WR.

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

TL;DR

Abstract

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Authors

TL;DR

Abstract

Table of Contents

Figures (4)