Table of Contents
Fetching ...

MERIT Feedback Elicits Better Bargaining in LLM Negotiators

Jihwan Oh, Murad Aghazada, Yooju Shin, Se-Young Yun, Taehyeon Kim

TL;DR

This work targets the gap between LLM bargaining capabilities and human strategic depth by introducing AgoraBench, a diverse, economically grounded negotiation benchmark, and Merit, a human-aligned utility metric that blends economic and preference-based considerations. It pairs this with a human-preference dataset and a prompting/training pipeline (Merit-guided prompting, ICL-MF, and SFT) to steer LLMs toward opponent-aware, human-aligned negotiation strategies. Empirical results show that conventional profit-oriented strategies diverge from human preferences, whereas Merit-guided approaches improve deal rates and yield more sophisticated, opponent-aware reasoning across varied market regimes (including deception, monopoly, and installment terms). The work demonstrates the potential of utility-based feedback to align LLM bargaining with human values, offering a path toward more reliable and realistic AI negotiators with practical implications for automated bargaining, customer interactions, and economic simulations. Limitations include seller-side dynamics, broader market contexts, and tool-assisted environments, which are highlighted as avenues for future work to further enhance realism and applicability.

Abstract

Bargaining is often regarded as a logical arena rather than an art or a matter of intuition, yet Large Language Models (LLMs) still struggle to navigate it due to limited strategic depth and difficulty adapting to complex human factors. Current benchmarks rarely capture this limitation. To bridge this gap, we present an utility feedback centric framework. Our contributions are: (i) AgoraBench, a new benchmark spanning nine challenging settings (e.g., deception, monopoly) that supports diverse strategy modeling; (ii) human-aligned, economically grounded metrics derived from utility theory. This is operationalized via agent utility, negotiation power, and acquisition ratio that implicitly measure how well the negotiation aligns with human preference and (iii) a human preference grounded dataset with learning pipeline that strengthens LLMs' bargaining ability through both prompting and finetuning. Empirical results indicate that baseline LLM strategies often diverge from human preferences, while our mechanism substantially improves negotiation performance, yielding deeper strategic behavior and stronger opponent awareness.

MERIT Feedback Elicits Better Bargaining in LLM Negotiators

TL;DR

This work targets the gap between LLM bargaining capabilities and human strategic depth by introducing AgoraBench, a diverse, economically grounded negotiation benchmark, and Merit, a human-aligned utility metric that blends economic and preference-based considerations. It pairs this with a human-preference dataset and a prompting/training pipeline (Merit-guided prompting, ICL-MF, and SFT) to steer LLMs toward opponent-aware, human-aligned negotiation strategies. Empirical results show that conventional profit-oriented strategies diverge from human preferences, whereas Merit-guided approaches improve deal rates and yield more sophisticated, opponent-aware reasoning across varied market regimes (including deception, monopoly, and installment terms). The work demonstrates the potential of utility-based feedback to align LLM bargaining with human values, offering a path toward more reliable and realistic AI negotiators with practical implications for automated bargaining, customer interactions, and economic simulations. Limitations include seller-side dynamics, broader market contexts, and tool-assisted environments, which are highlighted as avenues for future work to further enhance realism and applicability.

Abstract

Bargaining is often regarded as a logical arena rather than an art or a matter of intuition, yet Large Language Models (LLMs) still struggle to navigate it due to limited strategic depth and difficulty adapting to complex human factors. Current benchmarks rarely capture this limitation. To bridge this gap, we present an utility feedback centric framework. Our contributions are: (i) AgoraBench, a new benchmark spanning nine challenging settings (e.g., deception, monopoly) that supports diverse strategy modeling; (ii) human-aligned, economically grounded metrics derived from utility theory. This is operationalized via agent utility, negotiation power, and acquisition ratio that implicitly measure how well the negotiation aligns with human preference and (iii) a human preference grounded dataset with learning pipeline that strengthens LLMs' bargaining ability through both prompting and finetuning. Empirical results indicate that baseline LLM strategies often diverge from human preferences, while our mechanism substantially improves negotiation performance, yielding deeper strategic behavior and stronger opponent awareness.
Paper Structure (51 sections, 13 equations, 11 figures, 19 tables)

This paper contains 51 sections, 13 equations, 11 figures, 19 tables.

Figures (11)

  • Figure 1: Negotiation task between LLM agents and AgoraBench overview. (a) represents the simulator, (b) represents the human preference dataset, and (c) represents nine economically grounded market environments, each crafted around a distinct consumer good.
  • Figure 2: Product attributes for negotiation scenarios in multi product setting. We deploy seller cost, initial offer price, and buyer's budget for each product.
  • Figure 3: Demonstration that a human-aligned metric outperforms a purely profit-based one.
  • Figure 4: Number of turns for the negotiation. We feed dialogues from gpt-4o buyer and gemini-1.5-pro seller to gemma-3-27b judge.
  • Figure 5: Preference comparison between ICL-MF vs ReAct; OG-Narrator algorithm by LLM-judge
  • ...and 6 more figures