Table of Contents
Fetching ...

Towards General Negotiation Strategies with End-to-End Reinforcement Learning

Bram M. Renting, Thomas M. Moerland, Holger H. Hoos, Catholijn M. Jonker

TL;DR

An end-to-end reinforcement learning method for diverse negotiation problems by representing observations and actions as a graph and applying graph neural networks in the policy is developed and it is shown that it can learn to negotiate with other agents on never-before-seen negotiation problems.

Abstract

The research field of automated negotiation has a long history of designing agents that can negotiate with other agents. Such negotiation strategies are traditionally based on manual design and heuristics. More recently, reinforcement learning approaches have also been used to train agents to negotiate. However, negotiation problems are diverse, causing observation and action dimensions to change, which cannot be handled by default linear policy networks. Previous work on this topic has circumvented this issue either by fixing the negotiation problem, causing policies to be non-transferable between negotiation problems or by abstracting the observations and actions into fixed-size representations, causing loss of information and expressiveness due to feature design. We developed an end-to-end reinforcement learning method for diverse negotiation problems by representing observations and actions as a graph and applying graph neural networks in the policy. With empirical evaluations, we show that our method is effective and that we can learn to negotiate with other agents on never-before-seen negotiation problems. Our result opens up new opportunities for reinforcement learning in negotiation agents.

Towards General Negotiation Strategies with End-to-End Reinforcement Learning

TL;DR

An end-to-end reinforcement learning method for diverse negotiation problems by representing observations and actions as a graph and applying graph neural networks in the policy is developed and it is shown that it can learn to negotiate with other agents on never-before-seen negotiation problems.

Abstract

The research field of automated negotiation has a long history of designing agents that can negotiate with other agents. Such negotiation strategies are traditionally based on manual design and heuristics. More recently, reinforcement learning approaches have also been used to train agents to negotiate. However, negotiation problems are diverse, causing observation and action dimensions to change, which cannot be handled by default linear policy networks. Previous work on this topic has circumvented this issue either by fixing the negotiation problem, causing policies to be non-transferable between negotiation problems or by abstracting the observations and actions into fixed-size representations, causing loss of information and expressiveness due to feature design. We developed an end-to-end reinforcement learning method for diverse negotiation problems by representing observations and actions as a graph and applying graph neural networks in the policy. With empirical evaluations, we show that our method is effective and that we can learn to negotiate with other agents on never-before-seen negotiation problems. Our result opens up new opportunities for reinforcement learning in negotiation agents.
Paper Structure (15 sections, 4 equations, 5 figures, 2 tables)

This paper contains 15 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of our designed policy network based on GNNs. Observations are encoded in a graph representation (left) and passed through GNNs. Action distribution logits and state-value are obtained by passing the learned representation of the head node and value nodes through linear layers.
  • Figure 2: Mean and 99% confidence interval of episodic return during training based on results from 10 random seeds. The results of the policy designed by higa_reward-based_2023 and our policy are plotted.
  • Figure 3: Evaluation results of the policy designed by higa_reward-based_2023 and our GNN-based policy. Results are obtained by evaluating each trained policy for 1000 negotiation games against the set of baseline agents. Mean and 99% confidence interval are plotted based on 10 training iterations.
  • Figure 4: Mean and 99% confidence interval of episodic return during training of our GNN policy based on results from 10 different random seeds. The results from training against the baseline agents and training against the competition agents are plotted.
  • Figure 5: Evaluation results of our GNN-based policy on randomly generated negotiation problem both against the set of baseline opponents (left) and against the full set of opponents (right). Results are obtained by evaluating each trained policy for 1000 negotiation games against the set of agents. Mean and 99% confidence interval are plotted based on 10 training iterations.