Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models
Nimet Beyza Bozdag, Shuhaib Mehri, Gokhan Tur, Dilek Hakkani-Tür
TL;DR
PMIYC presents an automated, multi-agent framework to quantify persuasive effectiveness and susceptibility in LLMs through back-and-forth Persuader/Persuadee conversations across subjective and misinformation contexts. It defines a normalized change in agreement $\text{NC}(c)$ to enable cross-model comparisons and examines both single-turn and multi-turn interactions in a round-robin setup. Key findings show that larger models like Llama-3.3-70B-Instruct and GPT-4o are strong persuaders, while GPT-4o exhibits substantially greater resistance to misinformation persuasion; multi-turn exchanges amplify persuasive effects. The framework is validated against human annotations and prior work, offering a scalable alternative to human evaluation and providing insights for safer AI deployment by revealing how argument quality and belief alignment influence persuasion dynamics. Overall, PMIYC contributes a practical method for systematically studying persuasion in LLMs, with implications for alignment, safety, and responsible AI governance.
Abstract
Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion. While these capabilities can be used for social good, they also present risks of potential misuse. Moreover, LLMs' susceptibility to persuasion raises concerns about alignment with ethical principles. To study these dynamics, we introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasion through multi-agent interactions. Here, Persuader agents engage in multi-turn conversations with the Persuadee agents, allowing us to measure LLMs' persuasive effectiveness and their susceptibility to persuasion. We conduct comprehensive evaluations across diverse LLMs, ensuring each model is assessed against others in both subjective and misinformation contexts. We validate the efficacy of our framework through human evaluations and show alignment with prior work. PMIYC offers a scalable alternative to human annotation for studying persuasion in LLMs. Through PMIYC, we find that Llama-3.3-70B and GPT-4o exhibit similar persuasive effectiveness, outperforming Claude 3 Haiku by 30%. However, GPT-4o demonstrates over 50% greater resistance to persuasion for misinformation compared to Llama-3.3-70B. These findings provide empirical insights into the persuasive dynamics of LLMs and contribute to the development of safer AI systems.
