MERIT Feedback Elicits Better Bargaining in LLM Negotiators
Jihwan Oh, Murad Aghazada, Yooju Shin, Se-Young Yun, Taehyeon Kim
TL;DR
This work targets the gap between LLM bargaining capabilities and human strategic depth by introducing AgoraBench, a diverse, economically grounded negotiation benchmark, and Merit, a human-aligned utility metric that blends economic and preference-based considerations. It pairs this with a human-preference dataset and a prompting/training pipeline (Merit-guided prompting, ICL-MF, and SFT) to steer LLMs toward opponent-aware, human-aligned negotiation strategies. Empirical results show that conventional profit-oriented strategies diverge from human preferences, whereas Merit-guided approaches improve deal rates and yield more sophisticated, opponent-aware reasoning across varied market regimes (including deception, monopoly, and installment terms). The work demonstrates the potential of utility-based feedback to align LLM bargaining with human values, offering a path toward more reliable and realistic AI negotiators with practical implications for automated bargaining, customer interactions, and economic simulations. Limitations include seller-side dynamics, broader market contexts, and tool-assisted environments, which are highlighted as avenues for future work to further enhance realism and applicability.
Abstract
Bargaining is often regarded as a logical arena rather than an art or a matter of intuition, yet Large Language Models (LLMs) still struggle to navigate it due to limited strategic depth and difficulty adapting to complex human factors. Current benchmarks rarely capture this limitation. To bridge this gap, we present an utility feedback centric framework. Our contributions are: (i) AgoraBench, a new benchmark spanning nine challenging settings (e.g., deception, monopoly) that supports diverse strategy modeling; (ii) human-aligned, economically grounded metrics derived from utility theory. This is operationalized via agent utility, negotiation power, and acquisition ratio that implicitly measure how well the negotiation aligns with human preference and (iii) a human preference grounded dataset with learning pipeline that strengthens LLMs' bargaining ability through both prompting and finetuning. Empirical results indicate that baseline LLM strategies often diverge from human preferences, while our mechanism substantially improves negotiation performance, yielding deeper strategic behavior and stronger opponent awareness.
