Table of Contents
Fetching ...

AgreeMate: Teaching LLMs to Haggle

Ainesh Chatterjee, Samuel Miller, Nithin Parepally

TL;DR

AgreeMate tackles the problem of teaching LLMs to haggle by introducing a framework that evaluates negotiation capabilities across model scales and training strategies. It combines role-specific fine-tuning, memory-efficient training (LoRA with 4-bit nf4), and chain-of-thought prompting within a decoupled manager/generator architecture, augmented by attention probing to elucidate internal bargaining mechanics. The work contributes a detailed dataset preparation pipeline, a broad model-family evaluation (3B–70B), and a comprehensive metric suite including Agreement Rate, Fairness, Bias, and Probing Ratio, revealing that larger models are generally more agreeable and fairer, while CoT prompts induce exploratory behavior with trade-offs for smaller models. The findings have practical implications for deploying AI negotiators in digital marketplaces and provide actionable guidance on how scaling, reasoning modality, and personality priors shape negotiation outcomes, along with open-source code at the linked repository.

Abstract

We introduce AgreeMate, a framework for training Large Language Models (LLMs) to perform strategic price negotiations through natural language. We apply recent advances to a negotiation setting where two agents (i.e. buyer or seller) use natural language to bargain on goods using coarse actions. Specifically, we present the performance of Large Language Models when used as agents within a decoupled (modular) bargaining architecture. We demonstrate that using prompt engineering, fine-tuning, and chain-of-thought prompting enhances model performance, as defined by novel metrics. We use attention probing to show model attention to semantic relationships between tokens during negotiations.

AgreeMate: Teaching LLMs to Haggle

TL;DR

AgreeMate tackles the problem of teaching LLMs to haggle by introducing a framework that evaluates negotiation capabilities across model scales and training strategies. It combines role-specific fine-tuning, memory-efficient training (LoRA with 4-bit nf4), and chain-of-thought prompting within a decoupled manager/generator architecture, augmented by attention probing to elucidate internal bargaining mechanics. The work contributes a detailed dataset preparation pipeline, a broad model-family evaluation (3B–70B), and a comprehensive metric suite including Agreement Rate, Fairness, Bias, and Probing Ratio, revealing that larger models are generally more agreeable and fairer, while CoT prompts induce exploratory behavior with trade-offs for smaller models. The findings have practical implications for deploying AI negotiators in digital marketplaces and provide actionable guidance on how scaling, reasoning modality, and personality priors shape negotiation outcomes, along with open-source code at the linked repository.

Abstract

We introduce AgreeMate, a framework for training Large Language Models (LLMs) to perform strategic price negotiations through natural language. We apply recent advances to a negotiation setting where two agents (i.e. buyer or seller) use natural language to bargain on goods using coarse actions. Specifically, we present the performance of Large Language Models when used as agents within a decoupled (modular) bargaining architecture. We demonstrate that using prompt engineering, fine-tuning, and chain-of-thought prompting enhances model performance, as defined by novel metrics. We use attention probing to show model attention to semantic relationships between tokens during negotiations.

Paper Structure

This paper contains 30 sections, 9 equations, 22 figures, 6 tables.

Figures (22)

  • Figure 1: Training Loss Metrics. Left: EMA-smoothed loss. Right: Raw loss curve. Both plots demonstrate consistent loss convergence across 4k steps.
  • Figure 2: Steps Since Improvement. Minimal fluctuation indicates stable model convergence throughout training.
  • Figure 3: Evaluation Loss and Throughput Metrics. Left: Validation loss trends stabilize around 4.05. Right: Evaluation throughput demonstrates consistent samples per second.
  • Figure 4: Learning Rate and Gradient Behavior. Left: Layerwise cyclic learning rate decay. Right: Gradient norms demonstrate stable backpropagation dynamics.
  • Figure 5: Agreement Rates Across Personality Combinations. Aggressive buyers paired with fair sellers achieved the highest success, while passive combinations exhibited moderate agreement rates.
  • ...and 17 more figures