Reproducibility Study of Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation
Jose L. Garcia, Karolina Hajkova, Maria Marchenko, Carlos Miguel Patiño
TL;DR
This work conducts a comprehensive reproducibility study of a negotiated-LMM benchmark, testing open-weight models across a wide size range and a cost-effective GPT-4o Mini. It introduces two key extensions— Pareto-front analysis and a communication-free single-agent baseline—along with two new metrics: structure leakage and inequality (Gini coefficient). The results show that larger open-weight models can approach proprietary performance, while small models struggle with formatting and coherence; notably, a single-agent baseline can match multi-agent results, questioning the necessity of agent communication for success. The study also discusses environmental impact, accessibility, fairness, and privacy, and highlights limitations related to prompt sensitivity, recommending robust prompting and redesigned benchmark dynamics for more faithful assessments of negotiation skills.
Abstract
This paper presents a reproducibility study and extension of "Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation." We validate the original findings using a range of open-weight models (1.5B-70B parameters) and GPT-4o Mini while introducing several novel contributions. We analyze the Pareto front of the games, propose a communication-free baseline to test whether successful negotiations are possible without agent interaction, evaluate recent small language models' performance, analyze structural information leakage in model responses, and implement an inequality metric to assess negotiation fairness. Our results demonstrate that smaller models (<10B parameters) struggle with format adherence and coherent responses, but larger open-weight models can approach proprietary model performance. Additionally, in many scenarios, single-agent approaches can achieve comparable results to multi-agent negotiations, challenging assumptions about the necessity of agent communication to perform well on the benchmark. This work also provides insights into the accessibility, fairness, environmental impact, and privacy considerations of LLM-based negotiation systems.
