Table of Contents
Fetching ...

Sequential Resource Trading Using Comparison-Based Gradient Estimation

Surya Murthy, Mustafa O. Karabag, Ufuk Topcu

TL;DR

This work introduces Sequential Trading with Cone Refinements (ST-CR), a comparison-based gradient estimation algorithm for two greedily rational agents negotiating sequential trades over multiple resource categories. ST-CR uses binary offer responses (accept/reject) to infer a gradient-like direction of the responding agent's utility and refines a gradient cone to identify mutually beneficial offerings, guaranteeing that after a finite number of rejections either the agents' preferences align or the responding agent reaches near-optimal utility, yielding an $ \\epsilon$-weak Pareto optimal state under standard smoothness and concavity assumptions. The approach combines a Stage 1 heuristic with Stage 2 cone refinement, providing theoretical guarantees and practical efficiency advantages over baselines in both continuous and discrete settings. Empirical results, including numerical experiments and a human-in-the-loop user study, show ST-CR achieves higher societal benefit with fewer offers, particularly when agent goals are aligned, and demonstrate the method's applicability to human-agent negotiation with language-model-assisted feedback parsing.

Abstract

Autonomous agents interact with other autonomous agents and humans of unknown preferences to share resources in their environment. We explore sequential trading for resource allocation in a setting where two greedily rational agents sequentially trade resources from a finite set of categories. Each agent has a utility function that depends on the amount of resources it possesses in each category. The offering agent makes trade offers to improve its utility without knowing the responding agent's utility function, and the responding agent only accepts offers that improve its utility. To facilitate cooperation between an autonomous agent and another autonomous agent or a human, we present an algorithm for the offering agent to estimate the responding agent's gradient (preferences) and make offers based on previous acceptance or rejection responses. The algorithm's goal is to reach a Pareto-optimal resource allocation state while ensuring that the utilities of both agents improve after every accepted trade. The algorithm estimates the responding agent's gradient by leveraging the rejected offers and the greedy rationality assumption, to prune the space of potential gradients. We show that, after the algorithm makes a finite number of rejected offers, the algorithm either finds a mutually beneficial trade or certifies that the current state is epsilon-weakly Pareto optimal. We compare the proposed algorithm against various baselines in continuous and discrete trading scenarios and show that it improves the societal benefit with fewer offers. Additionally, we validate these findings in a user study with human participants, where the algorithm achieves high performance in scenarios with high resource conflict due to aligned agent goals.

Sequential Resource Trading Using Comparison-Based Gradient Estimation

TL;DR

This work introduces Sequential Trading with Cone Refinements (ST-CR), a comparison-based gradient estimation algorithm for two greedily rational agents negotiating sequential trades over multiple resource categories. ST-CR uses binary offer responses (accept/reject) to infer a gradient-like direction of the responding agent's utility and refines a gradient cone to identify mutually beneficial offerings, guaranteeing that after a finite number of rejections either the agents' preferences align or the responding agent reaches near-optimal utility, yielding an -weak Pareto optimal state under standard smoothness and concavity assumptions. The approach combines a Stage 1 heuristic with Stage 2 cone refinement, providing theoretical guarantees and practical efficiency advantages over baselines in both continuous and discrete settings. Empirical results, including numerical experiments and a human-in-the-loop user study, show ST-CR achieves higher societal benefit with fewer offers, particularly when agent goals are aligned, and demonstrate the method's applicability to human-agent negotiation with language-model-assisted feedback parsing.

Abstract

Autonomous agents interact with other autonomous agents and humans of unknown preferences to share resources in their environment. We explore sequential trading for resource allocation in a setting where two greedily rational agents sequentially trade resources from a finite set of categories. Each agent has a utility function that depends on the amount of resources it possesses in each category. The offering agent makes trade offers to improve its utility without knowing the responding agent's utility function, and the responding agent only accepts offers that improve its utility. To facilitate cooperation between an autonomous agent and another autonomous agent or a human, we present an algorithm for the offering agent to estimate the responding agent's gradient (preferences) and make offers based on previous acceptance or rejection responses. The algorithm's goal is to reach a Pareto-optimal resource allocation state while ensuring that the utilities of both agents improve after every accepted trade. The algorithm estimates the responding agent's gradient by leveraging the rejected offers and the greedy rationality assumption, to prune the space of potential gradients. We show that, after the algorithm makes a finite number of rejected offers, the algorithm either finds a mutually beneficial trade or certifies that the current state is epsilon-weakly Pareto optimal. We compare the proposed algorithm against various baselines in continuous and discrete trading scenarios and show that it improves the societal benefit with fewer offers. Additionally, we validate these findings in a user study with human participants, where the algorithm achieves high performance in scenarios with high resource conflict due to aligned agent goals.
Paper Structure (43 sections, 3 theorems, 65 equations, 13 figures, 5 tables, 12 algorithms)

This paper contains 43 sections, 3 theorems, 65 equations, 13 figures, 5 tables, 12 algorithms.

Key Result

Theorem 1

Let $\kappa \geq \sqrt{n-1}$ be a parameter satisfying If $f^{B}$ is $\beta$-smooth, at least one of the following holds true after Procedure algo:CR makes $k$ rejected offers at state $(S_{A}, S_{B})$:

Figures (13)

  • Figure 1: ST-CR maintains a cone of potential gradients (a) and uses responses to the trade offer $T$ to refine the cone (b).
  • Figure 2: 2D top-down view of 3D cone refinement using the plane with normal $\tau$. The ellipses are the cross-sections of the cones. The points are the cross-sections of the true and hypothetical gradients. $h(T_{i})$ is the hyperplane generated by offer $T_{i}$. The shaded region represents all possible gradient directions after the halfspace cuts using $h(T_{i})$s.
  • Figure 3: Top-down view of cone refinement. Discrete offers lead to non-orthogonal or off-center cuts. Off-center cuts occur if an offer is not orthogonal to $\tau$ and non-orthogonal cuts occur if offers are not mutually orthogonal.
  • Figure 4: Offer-benefit plots for discrete trading scenarios.
  • Figure 5: Offer-benefit plots for continuous trading scenarios.
  • ...and 8 more figures

Theorems & Definitions (7)

  • Remark 1: Greedy Rationality Assumption
  • Remark 2: Performance in Unaligned Settings
  • Theorem 1
  • Corollary 1.1: Weak Pareto Optimality
  • proof : Proof of Theorem \ref{['thm']}
  • Lemma 2
  • proof : Proof of Lemma \ref{['lm:1']}