Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

Shuyuan Liu; Jiawei Chen; Shouwei Ruan; Hang Su; Zhaoxia Yin

Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

Shuyuan Liu, Jiawei Chen, Shouwei Ruan, Hang Su, Zhaoxia Yin

TL;DR

This work addresses the robustness of LLM-based embodied agents to adversarial prompts in multimodal environments by introducing EIRAD, a multimodal dataset with untargeted and targeted attacks, and a BLIP2-based success metric. It proposes prompt suffix initialization and a greedy-like adversarial-suffix optimization to craft attacks, demonstrating higher attack success rates and reduced convergence time across TaPA, Otter, and Llama-2-chat. The findings reveal notable vulnerabilities in the decision-level robustness of embodied LLM systems and motivate the development of defenses to ensure secure, reliable embodied intelligence in real-world settings.

Abstract

Embodied intelligence empowers agents with a profound sense of perception, enabling them to respond in a manner closely aligned with real-world situations. Large Language Models (LLMs) delve into language instructions with depth, serving a crucial role in generating plans for intricate tasks. Thus, LLM-based embodied models further enhance the agent's capacity to comprehend and process information. However, this amalgamation also ushers in new challenges in the pursuit of heightened intelligence. Specifically, attackers can manipulate LLMs to produce irrelevant or even malicious outputs by altering their prompts. Confronted with this challenge, we observe a notable absence of multi-modal datasets essential for comprehensively evaluating the robustness of LLM-based embodied models. Consequently, we construct the Embodied Intelligent Robot Attack Dataset (EIRAD), tailored specifically for robustness evaluation. Additionally, two attack strategies are devised, including untargeted attacks and targeted attacks, to effectively simulate a range of diverse attack scenarios. At the same time, during the attack process, to more accurately ascertain whether our method is successful in attacking the LLM-based embodied model, we devise a new attack success evaluation method utilizing the BLIP2 model. Recognizing the time and cost-intensive nature of the GCG algorithm in attacks, we devise a scheme for prompt suffix initialization based on various target tasks, thus expediting the convergence process. Experimental results demonstrate that our method exhibits a superior attack success rate when targeting LLM-based embodied models, indicating a lower level of decision-level robustness in these models.

Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 8 figures, 3 tables)

This paper contains 21 sections, 3 equations, 8 figures, 3 tables.

Introduction
Related Works
Embodied task planning
Jailbreak attack based on LLM
Method
Dataset analysis and creation process
Data types and statistics.
Description of the dataset creation process.
Embodied scenario attack algorithm
Initialize prompt suffix
Optimize adversarial suffixes
Judgment of attack success
Experiments
Settings
Main results
...and 6 more sections

Figures (8)

Figure 1: Illustration of embodied intelligence attack. Before being attacked, the embodied intelligent robot performed its tasks normally. After suffering a malicious attack, the robot performs harmful actions.
Figure 2: Data type distribution in EIRAD
Figure 3: The data statistics of multi-modal. (a) The 10 most frequently prompted verbs in harmless data along with their corresponding noun objects. (b) The 10 most frequently targeted verbs in harmless data along with their corresponding noun objects. (c) The 10 most frequently targeted verbs in harmful data along with their corresponding noun objects.
Figure 4: The creation process of multi-modal dataset
Figure 5: The framework of the attack algorithm. Attack algorithms are categorized into two main types: untargeted attacks and targeted attacks. The targeted approach builds upon the foundation of non-targeted methods, showcasing differences in keyword initialization (step 1), selection of optimal suffixes (step 2), and selection of evaluation objects (step 3).
...and 3 more figures

Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

TL;DR

Abstract

Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)