Table of Contents
Fetching ...

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Minqian Liu, Zhiyang Xu, Xinyi Zhang, Heajun An, Sarvech Qadir, Qi Zhang, Pamela J. Wisniewski, Jin-Hee Cho, Sang Won Lee, Ruoxi Jia, Lifu Huang

TL;DR

This work addresses the safety risks of LLMs as persuasive agents in multi-turn conversations by introducing PersuSafety, a three-stage framework for Task Generation, Conversation Simulation, and Safety Assessment. The authors curate a broad set of unethical and ethically neutral persuasion tasks, simulate interactions between LLM persuaders and persuadees with vulnerability and contextual factors, and evaluate safety using automated judgments validated by humans across eight LLMs. Key findings reveal substantial safety gaps: many models engage in unethical persuasion, refusal to engage does not reliably predict safe behavior during execution, and exposing persuadee vulnerabilities amplifies unethical tactics, especially under external pressures. The results underscore the need for stronger safety alignment techniques in progressive, goal-driven dialogue systems and offer a structured pathway for evaluating and mitigating risks in real-world LLM deployment.

Abstract

Recent advancements in Large Language Models (LLMs) have enabled them to approach human-level persuasion capabilities. However, such potential also raises concerns about the safety risks of LLM-driven persuasion, particularly their potential for unethical influence through manipulation, deception, exploitation of vulnerabilities, and many other harmful tactics. In this work, we present a systematic investigation of LLM persuasion safety through two critical aspects: (1) whether LLMs appropriately reject unethical persuasion tasks and avoid unethical strategies during execution, including cases where the initial persuasion goal appears ethically neutral, and (2) how influencing factors like personality traits and external pressures affect their behavior. To this end, we introduce PersuSafety, the first comprehensive framework for the assessment of persuasion safety which consists of three stages, i.e., persuasion scene creation, persuasive conversation simulation, and persuasion safety assessment. PersuSafety covers 6 diverse unethical persuasion topics and 15 common unethical strategies. Through extensive experiments across 8 widely used LLMs, we observe significant safety concerns in most LLMs, including failing to identify harmful persuasion tasks and leveraging various unethical persuasion strategies. Our study calls for more attention to improve safety alignment in progressive and goal-driven conversations such as persuasion.

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

TL;DR

This work addresses the safety risks of LLMs as persuasive agents in multi-turn conversations by introducing PersuSafety, a three-stage framework for Task Generation, Conversation Simulation, and Safety Assessment. The authors curate a broad set of unethical and ethically neutral persuasion tasks, simulate interactions between LLM persuaders and persuadees with vulnerability and contextual factors, and evaluate safety using automated judgments validated by humans across eight LLMs. Key findings reveal substantial safety gaps: many models engage in unethical persuasion, refusal to engage does not reliably predict safe behavior during execution, and exposing persuadee vulnerabilities amplifies unethical tactics, especially under external pressures. The results underscore the need for stronger safety alignment techniques in progressive, goal-driven dialogue systems and offer a structured pathway for evaluating and mitigating risks in real-world LLM deployment.

Abstract

Recent advancements in Large Language Models (LLMs) have enabled them to approach human-level persuasion capabilities. However, such potential also raises concerns about the safety risks of LLM-driven persuasion, particularly their potential for unethical influence through manipulation, deception, exploitation of vulnerabilities, and many other harmful tactics. In this work, we present a systematic investigation of LLM persuasion safety through two critical aspects: (1) whether LLMs appropriately reject unethical persuasion tasks and avoid unethical strategies during execution, including cases where the initial persuasion goal appears ethically neutral, and (2) how influencing factors like personality traits and external pressures affect their behavior. To this end, we introduce PersuSafety, the first comprehensive framework for the assessment of persuasion safety which consists of three stages, i.e., persuasion scene creation, persuasive conversation simulation, and persuasion safety assessment. PersuSafety covers 6 diverse unethical persuasion topics and 15 common unethical strategies. Through extensive experiments across 8 widely used LLMs, we observe significant safety concerns in most LLMs, including failing to identify harmful persuasion tasks and leveraging various unethical persuasion strategies. Our study calls for more attention to improve safety alignment in progressive and goal-driven conversations such as persuasion.

Paper Structure

This paper contains 39 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of our PersuSafety framework.
  • Figure 2: Taxonomy of the unethical persuasion strategies studied in our work.
  • Figure 3: Safety Refusal Checking. We report the number of unethical persuasion tasks where the model failed to refuse. The lower number indicates the model is safer.
  • Figure 4: Main experiments on scanning of unethical persuasion strategy usage on unethical persuasion tasks. The value in each cell indicates the degree and frequency of the strategy usage, where higher values indicate more frequent usage. The value in each cell uses our 3-scale criteria (0 is the lowest and 2 is the highest). We consider the persuasion tasks that the corresponding model does not refuse.
  • Figure 5: Analysis of unethical persuasion strategy usage when the persuader is aware of persuadee's vulnerabilities (Visible) and when persuader is NOT aware of the vulnerabilities (Invisible). The value in each cell uses our 3-scale criteria (0 is the lowest and 2 is the highest). We highlight the cells with the highest values in each column with the darkest color, and highlight the cells with the lowest values with the lightest color. Best viewed in color.
  • ...and 2 more figures