Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Fan Yang

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Fan Yang

TL;DR

This work designs three perturbation strategies: multi-stream interleaving, inversion perturbation, and shape transformation, which disrupt the thinking process through concurrent task interleaving, character reversal, and format constraints respectively, and achieves attack success rates exceeding most methods across mainstream models.

Abstract

The widespread adoption of thinking mode in large language models (LLMs) has significantly enhanced complex task processing capabilities while introducing new security risks. When subjected to jailbreak attacks, the step-by-step reasoning process may cause models to generate more detailed harmful content. We observe that thinking mode exhibits unique vulnerabilities when processing interleaved multiple tasks. Based on this observation, we propose multi-stream perturbation attack, which generates superimposed interference by interweaving multiple task streams within a single prompt. We design three perturbation strategies: multi-stream interleaving, inversion perturbation, and shape transformation, which disrupt the thinking process through concurrent task interleaving, character reversal, and format constraints respectively. On JailbreakBench, AdvBench, and HarmBench datasets, our method achieves attack success rates exceeding most methods across mainstream models including Qwen3 series, DeepSeek, Qwen3-Max, and Gemini 2.5 Flash. Experiments show thinking collapse rates and response repetition rates reach up to 17% and 60% respectively, indicating multi-stream perturbation not only bypasses safety mechanisms but also causes thinking process collapse or repetitive outputs.

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

TL;DR

Abstract

Paper Structure (26 sections, 4 equations, 11 figures, 8 tables)

This paper contains 26 sections, 4 equations, 11 figures, 8 tables.

Introduction
Related Work
Thinking Mode in Large Language Models
Jailbreak Attack
Multi-Stream Perturbation Attack Method
Problem Setting and Basic Framework
Perturbation Strategies
Experiment
Experimental setup
Attack Success Rate Experimental Results
Thinking Attack Experimental Results
Ablation Study
Harmful Content Detection Defense
Conclusion
Ethics Statement
...and 11 more sections

Figures (11)

Figure 1: Framework of multi-stream perturbation attack.
Figure 2: ASR comparison of seven attack methods on Qwen3 series models.
Figure 3: ASR comparison of five attack methods on DeepSeek, Qwen3-Max, and Gemini 2.5 Flash on the JailbreakBench dataset.
Figure 4: Thinking length comparison of five attack methods on Qwen3 series models.
Figure 5: Thinking length comparison of five attack methods on DeepSeek and Qwen3-Max on the JailbreakBench dataset.
...and 6 more figures

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

TL;DR

Abstract

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Authors

TL;DR

Abstract

Table of Contents

Figures (11)