Boosting Jailbreak Transferability for Large Language Models

Hanqing Liu; Lifeng Zhou; Huanqian Yan

Boosting Jailbreak Transferability for Large Language Models

Hanqing Liu, Lifeng Zhou, Huanqian Yan

TL;DR

This work proposes several enhancements, including a scenario induction template, optimized suffix selection, and the integration of re-suffix attack mechanism to reduce inconsistent outputs in large language models like GCG.

Abstract

Large language models have drawn significant attention to the challenge of safe alignment, especially regarding jailbreak attacks that circumvent security measures to produce harmful content. To address the limitations of existing methods like GCG, which perform well in single-model attacks but lack transferability, we propose several enhancements, including a scenario induction template, optimized suffix selection, and the integration of re-suffix attack mechanism to reduce inconsistent outputs. Our approach has shown superior performance in extensive experiments across various benchmarks, achieving nearly 100% success rates in both attack execution and transferability. Notably, our method has won the first place in the AISG-hosted Global Challenge for Safe and Secure LLMs. The code is released at https://github.com/HqingLiu/SI-GCG.

Boosting Jailbreak Transferability for Large Language Models

TL;DR

Abstract

Boosting Jailbreak Transferability for Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)