Table of Contents
Fetching ...

Measuring Fine-Grained Negotiation Tactics of Humans and LLMs in Diplomacy

Wenkai Li, Lynnette Hui Xian Ng, Andy Liu, Daniel Fried

TL;DR

This work advances the study of negotiation by measuring fine-grained tactics in Diplomacy, not just win rates. It introduces an LLM-as-a-judge pipeline to annotate eight Ethos-Pathos-Logos-based tactics, links these tactics to short-term (SCG) and long-term (win/loss) outcomes on It Takes Two and WebDiplomacy data, and examines how to align LLM negotiation styles with human behaviors via LoRA-based supervised fine-tuning. Key findings show that specific tactics, particularly Game-Move and social cues like Rapport, robustly predict short-term success and that winners systematically engage more tactics; alignment reduces the gap between LLM and human tactics, though some socio-emotional nuances remain challenging. The results offer a scalable framework for quantifying and steering AI negotiation behavior toward more human-like, socially aware patterns, with implications for designing AI negotiators that balance tactical effectiveness and relational dynamics.

Abstract

The study of negotiation styles dates back to Aristotle's ethos-pathos-logos rhetoric. Prior efforts primarily studied the success of negotiation agents. Here, we shift the focus towards the styles of negotiation strategies. Our focus is the strategic dialogue board game Diplomacy, which affords rich natural language negotiation and measures of game success. We used LLM-as-a-judge to annotate a large human-human set of Diplomacy games for fine-grained negotiation tactics from a sociologically-grounded taxonomy. Using a combination of the It Takes Two and WebDiplomacy datasets, we demonstrate the reliability of our LLM-as-a-Judge framework and show strong correlations between negotiation features and success in the Diplomacy setting. Lastly, we investigate the differences between LLM and human negotiation strategies and show that fine-tuning can steer LLM agents toward more human-like negotiation behaviors.

Measuring Fine-Grained Negotiation Tactics of Humans and LLMs in Diplomacy

TL;DR

This work advances the study of negotiation by measuring fine-grained tactics in Diplomacy, not just win rates. It introduces an LLM-as-a-judge pipeline to annotate eight Ethos-Pathos-Logos-based tactics, links these tactics to short-term (SCG) and long-term (win/loss) outcomes on It Takes Two and WebDiplomacy data, and examines how to align LLM negotiation styles with human behaviors via LoRA-based supervised fine-tuning. Key findings show that specific tactics, particularly Game-Move and social cues like Rapport, robustly predict short-term success and that winners systematically engage more tactics; alignment reduces the gap between LLM and human tactics, though some socio-emotional nuances remain challenging. The results offer a scalable framework for quantifying and steering AI negotiation behavior toward more human-like, socially aware patterns, with implications for designing AI negotiators that balance tactical effectiveness and relational dynamics.

Abstract

The study of negotiation styles dates back to Aristotle's ethos-pathos-logos rhetoric. Prior efforts primarily studied the success of negotiation agents. Here, we shift the focus towards the styles of negotiation strategies. Our focus is the strategic dialogue board game Diplomacy, which affords rich natural language negotiation and measures of game success. We used LLM-as-a-judge to annotate a large human-human set of Diplomacy games for fine-grained negotiation tactics from a sociologically-grounded taxonomy. Using a combination of the It Takes Two and WebDiplomacy datasets, we demonstrate the reliability of our LLM-as-a-Judge framework and show strong correlations between negotiation features and success in the Diplomacy setting. Lastly, we investigate the differences between LLM and human negotiation strategies and show that fine-tuning can steer LLM agents toward more human-like negotiation behaviors.

Paper Structure

This paper contains 57 sections, 4 equations, 20 figures, 15 tables.

Figures (20)

  • Figure 1: Methodology Overview: Our pipeline consists of three stages: (1) Reliable tactic annotation. We first annotate negotiation tactics with an LLM-as-a-Judge and validate its reliability on the It Takes Two dataset by computing agreement with expert annotators. (2) Linking tactics to outcomes. Using real human communications and game logs from WebDiplomacy, we study how annotated negotiation tactics relate to performance, analyzing short-term correlations and long-term win/loss outcomes. (3) Aligning LLMs with humans. We do supervised finetuning on filtered WebDiplomacy interactions to align LLM negotiation style with human tactics and quantify the LLM–human tactic distance.
  • Figure 2: Gwet’s AC1 agreement scores per negotiation tactic across models and prompting methods when compared against the expert gold standard. The dashed red line indicates the threshold for moderate agreement ($\text{AC1}=0.61$), while the dashed green line indicates substantial agreement ($\text{AC1}=0.8$).
  • Figure 3: Correlation between annotated negotiation features and supply center gain.
  • Figure 4: Number of negotiation tactics per year across supply center count
  • Figure 5: Model-Human Distance
  • ...and 15 more figures