Table of Contents
Fetching ...

A Benchmark for Multi-Party Negotiation Games from Real Negotiation Data

Leo Benac, Jonas Raedler, Zilin Ma, Finale Doshi-Velez

Abstract

Many real-world multi-party negotiations unfold as sequences of binding, action-level commitments rather than a single final outcome. We introduce a benchmark for this under-studied regime featuring a configurable game generator that sweeps key structural properties such as incentive alignment, goal complexity, and payoff distribution. To evaluate decision-making, we test three value-function approximations - myopic reward, an optimistic upper bound, and a pessimistic lower bound - that act as biased lenses on deal evaluation. Through exact evaluation on small games and comparative evaluation on large, document-grounded instances derived from the Harvard Negotiation Challenge, we map the strategic regimes where each approximation succeeds or fails. We observe that different game structures demand different valuation strategies, motivating agents that learn robust state values and plan effectively over long horizons under binding commitments and terminal only rewards.

A Benchmark for Multi-Party Negotiation Games from Real Negotiation Data

Abstract

Many real-world multi-party negotiations unfold as sequences of binding, action-level commitments rather than a single final outcome. We introduce a benchmark for this under-studied regime featuring a configurable game generator that sweeps key structural properties such as incentive alignment, goal complexity, and payoff distribution. To evaluate decision-making, we test three value-function approximations - myopic reward, an optimistic upper bound, and a pessimistic lower bound - that act as biased lenses on deal evaluation. Through exact evaluation on small games and comparative evaluation on large, document-grounded instances derived from the Harvard Negotiation Challenge, we map the strategic regimes where each approximation succeeds or fails. We observe that different game structures demand different valuation strategies, motivating agents that learn robust state values and plan effectively over long horizons under binding commitments and terminal only rewards.
Paper Structure (27 sections, 4 equations, 6 figures, 3 tables, 3 algorithms)

This paper contains 27 sections, 4 equations, 6 figures, 3 tables, 3 algorithms.

Figures (6)

  • Figure 1: Algorithm performance on small games, measured by L1 error against the exact optimal self-interested payoff (lower better, note columns differ in y-scales). Points show mean $\pm$ standard error over 50 random seeds, averaged across goal complexities and latent factors. Columns correspond to payoff regimes (ordered hardest to easiest) and rows to incentive alignment; the x-axis sweeps the fraction of non-linear (all-or-nothing) goals. The Myopic Reward method performs best in balanced games (left). The Upper bound performs best in negative-dominated regimes and gains a clear advantage when a Poison Pill (PP) trap is injected (middle). The Lower bound performs best in positive-dominated games (right). We omit the Upper bound in the positive-dominated regime because its large errors that would dominate the scale. Across approximations, error increases with non-linearity and is higher in adversarial settings.
  • Figure 2: Caption for topfile 1 results.
  • Figure 3: Caption for topfile 2 results.
  • Figure 4: Caption for topfile 3 results.
  • Figure 5: Caption for topfile 4 results.
  • ...and 1 more figures