Assessing Adversarial Robustness of Large Language Models: An Empirical Study

Zeyu Yang; Zhao Meng; Xiaochen Zheng; Roger Wattenhofer

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

Zeyu Yang, Zhao Meng, Xiaochen Zheng, Roger Wattenhofer

TL;DR

A novel white-box style attack approach is presented that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5, and establishes a new benchmark for LLM robustness.

Abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversarial perturbations. Our comprehensive evaluation across five diverse text classification tasks establishes a new benchmark for LLM robustness. The findings of this study have far-reaching implications for the reliable deployment of LLMs in real-world applications and contribute to the advancement of trustworthy AI systems.

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

TL;DR

A novel white-box style attack approach is presented that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5, and establishes a new benchmark for LLM robustness.

Abstract

Paper Structure (30 sections, 7 figures, 13 tables)

This paper contains 30 sections, 7 figures, 13 tables.

Introduction
Related Work
The Evaluation of LLMs
Robustness in NLP
Preliminaries
Open-source Large Language Models
Fine-tuning Techniques
LoRA
Quantization
QLoRA
Methods
Adversarial Attack
Geometry Attack Methodology
Experiment Settings
Experiment Pipeline
...and 15 more sections

Figures (7)

Figure 1: The framework of our adversarial robustness assessment
Figure 2: The experimental results of different models on various datasets.
Figure 3: The experimental results of T5 and Flan-T5 on IMDB dataset
Figure 4: Different precisions on the IMDB dataset (T5 model)
Figure 5: Different precisions on the IMDB dataset (OPT Model)
...and 2 more figures

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

TL;DR

Abstract

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

Authors

TL;DR

Abstract

Table of Contents

Figures (7)