Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method
Tian Xia, Zhiwei He, Tong Ren, Yibo Miao, Zhuosheng Zhang, Yang Yang, Rui Wang
TL;DR
This work targets the evaluation of bargaining abilities in Large Language Model (LLM) agents by formalizing bargaining as an asymmetric incomplete-information game and introducing AmazonHistoryPrice, a real-product price dataset with 930 items across 18 categories. It defines a Bargaining benchmark with a Rubinstein-inspired process, two session types (Mutual Interest and Conflicting Interest), and metrics based on Normalized Profits, SNP, and Shares to quantify Buyer and Seller performance. The authors provide comprehensive experiments comparing multiple LLMs and reveal that Buyers underperform relative to Sellers, with larger models not substantially improving Buyer outcomes. To address this, they propose OG-Narrator, a simple, pipeline-based method that deterministically generates offers while an LLM Narrator renders natural language dialogue, dramatically boosting Buyer performance and even enabling unaligned models to bargain effectively; it also exposes vulnerabilities in ChatGPT as a Seller under adversarial bargaining. Overall, the work advances evaluation methodology for AI bargaining, demonstrates practical limitations of current models, and offers a scalable technique to enhance Buyer capabilities in price negotiation tasks with potential real-world impact for autonomous agents.
Abstract
Bargaining is an important and unique part of negotiation between humans. As LLM-driven agents learn to negotiate and act like real humans, how to evaluate agents' bargaining abilities remains an open problem. For the first time, we formally described the Bargaining task as an asymmetric incomplete information game, defining the gains of the Buyer and Seller in multiple bargaining processes. It allows us to quantitatively assess an agent's performance in the Bargain task. We collected a real product price dataset, AmazonHistoryPrice, and conducted evaluations of various LLM agents' bargaining abilities. We find that playing a Buyer is much harder than a Seller, and increasing model size can not effectively improve the Buyer's performance. To address the challenge, we propose a novel approach called OG-Narrator that integrates a deterministic Offer Generator to control the price range of Buyer's offers, and an LLM Narrator to create natural language sentences for generated offers. Experimental results show that OG-Narrator improves the buyer's deal rates from 26.67% to 88.88% and brings a ten times multiplication of profits on all baselines, even a model that has not been aligned.
