AREG: Adversarial Resource Extraction Game for Evaluating Persuasion and Resistance in Large Language Models
Adib Sakhawat, Fardeen Sadab
TL;DR
AREG introduces a novel benchmark that treats social influence as a multi-turn, zero-sum resource-extraction game between a persuader and a resource holder, adjudicated by a deterministic Arbiter. By measuring dual Elo scores (C-Elo for persuasion and V-Elo for resistance) across eight frontier LLMs in a five-round round-robin, the study reveals a robust defensive advantage and only a weak positive relationship between offensive and defensive capabilities ($\rho=0.33$). Linguistic analyses show that verification-seeking and incremental commitments drive success, while explicit refusals underperform, highlighting pragmatic strategies that underlie resistance and persuasion. The work emphasizes that social intelligence is not monolithic, demonstrating the need for bidirectional evaluation frameworks to identify asymmetric vulnerabilities and inform robust deployment of large language models.
Abstract
Evaluating the social intelligence of Large Language Models (LLMs) increasingly requires moving beyond static text generation toward dynamic, adversarial interaction. We introduce the Adversarial Resource Extraction Game (AREG), a benchmark that operationalizes persuasion and resistance as a multi-turn, zero-sum negotiation over financial resources. Using a round-robin tournament across frontier models, AREG enables joint evaluation of offensive (persuasion) and defensive (resistance) capabilities within a single interactional framework. Our analysis provides evidence that these capabilities are weakly correlated ($ρ= 0.33$) and empirically dissociated: strong persuasive performance does not reliably predict strong resistance, and vice versa. Across all evaluated models, resistance scores exceed persuasion scores, indicating a systematic defensive advantage in adversarial dialogue settings. Further linguistic analysis suggests that interaction structure plays a central role in these outcomes. Incremental commitment-seeking strategies are associated with higher extraction success, while verification-seeking responses are more prevalent in successful defenses than explicit refusal. Together, these findings indicate that social influence in LLMs is not a monolithic capability and that evaluation frameworks focusing on persuasion alone may overlook asymmetric behavioral vulnerabilities.
