Table of Contents
Fetching ...

AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

Xianyang Liu, Shangding Gu, Dawn Song

TL;DR

Benchmarking state-of-the-art proprietary and open-weight LLMs reveals substantial gaps in negotiation performance and highlights challenges in long-horizon strategic reasoning, establishing AgenticPay as a foundation for studying agentic commerce and language-based market interaction.

Abstract

Large language model (LLM)-based agents are increasingly expected to negotiate, coordinate, and transact autonomously, yet existing benchmarks lack principled settings for evaluating language-mediated economic interaction among multiple agents. We introduce AgenticPay, a benchmark and simulation framework for multi-agent buyer-seller negotiation driven by natural language. AgenticPay models markets in which buyers and sellers possess private constraints and product-dependent valuations, and must reach agreements through multi-round linguistic negotiation rather than numeric bidding alone. The framework supports a diverse suite of over 110 tasks ranging from bilateral bargaining to many-to-many markets, with structured action extraction and metrics for feasibility, efficiency, and welfare. Benchmarking state-of-the-art proprietary and open-weight LLMs reveals substantial gaps in negotiation performance and highlights challenges in long-horizon strategic reasoning, establishing AgenticPay as a foundation for studying agentic commerce and language-based market interaction. Code and dataset are available at the link: https://github.com/SafeRL-Lab/AgenticPay.

AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

TL;DR

Benchmarking state-of-the-art proprietary and open-weight LLMs reveals substantial gaps in negotiation performance and highlights challenges in long-horizon strategic reasoning, establishing AgenticPay as a foundation for studying agentic commerce and language-based market interaction.

Abstract

Large language model (LLM)-based agents are increasingly expected to negotiate, coordinate, and transact autonomously, yet existing benchmarks lack principled settings for evaluating language-mediated economic interaction among multiple agents. We introduce AgenticPay, a benchmark and simulation framework for multi-agent buyer-seller negotiation driven by natural language. AgenticPay models markets in which buyers and sellers possess private constraints and product-dependent valuations, and must reach agreements through multi-round linguistic negotiation rather than numeric bidding alone. The framework supports a diverse suite of over 110 tasks ranging from bilateral bargaining to many-to-many markets, with structured action extraction and metrics for feasibility, efficiency, and welfare. Benchmarking state-of-the-art proprietary and open-weight LLMs reveals substantial gaps in negotiation performance and highlights challenges in long-horizon strategic reasoning, establishing AgenticPay as a foundation for studying agentic commerce and language-based market interaction. Code and dataset are available at the link: https://github.com/SafeRL-Lab/AgenticPay.
Paper Structure (46 sections, 2 figures, 19 tables, 1 algorithm)

This paper contains 46 sections, 2 figures, 19 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of AgenticPay. (a) Agents & Task Examples: Buyer and seller agents engage in three negotiation modes: 1-to-1 (bilateral bargaining between a single buyer and seller), 1-to-N (one buyer negotiating with multiple competing sellers, or one seller negotiating with multiple competing buyers), and N-to-N (many buyers and sellers forming a matching market). (b) Framework: Core components including Environment, Task, and Agent interact to enable multi-round negotiations. (c) Dialogue Example: A sample negotiation showing the user's product requirements, buyer--seller conversation, and final deal.
  • Figure 2: Overview of the AgenticPay task suite.Left: Ten realistic business scenarios across four categories: Consumer, Services, Supply, and Assets. Right: Task categories illustrating the progression from bilateral bargaining to full market settings along three complexity dimensions: number of buyers, number of sellers, and product set size.