Table of Contents
Fetching ...

Strategic Dialogue Management via Deep Reinforcement Learning

Heriberto Cuayáhuitl, Simon Keizer, Oliver Lemon

TL;DR

The paper tackles how to endow AI agents with strategic negotiation abilities in dialogue by applying deep Q-learning to the game of Settlers of Catan. By learning both offers and responses within constrained, legal action sets and using a high-dimensional state representation, the authors demonstrate that DRL agents outperform random, heuristic, and supervised baselines, achieving notable win rates and more trading activity. The work shows DRL as a promising framework for training complex, negotiation-capable dialogue systems and lays groundwork for future language-enabled strategic agents. Overall, the study provides a rigorous evaluation in a realistic multi-agent environment and highlights the benefits of learning from constrained actions and environment-specific feedback.

Abstract

Artificially intelligent agents equipped with strategic skills that can negotiate during their interactions with other natural or artificial agents are still underdeveloped. This paper describes a successful application of Deep Reinforcement Learning (DRL) for training intelligent agents with strategic conversational skills, in a situated dialogue setting. Previous studies have modelled the behaviour of strategic agents using supervised learning and traditional reinforcement learning techniques, the latter using tabular representations or learning with linear function approximation. In this study, we apply DRL with a high-dimensional state space to the strategic board game of Settlers of Catan---where players can offer resources in exchange for others and they can also reply to offers made by other players. Our experimental results report that the DRL-based learnt policies significantly outperformed several baselines including random, rule-based, and supervised-based behaviours. The DRL-based policy has a 53% win rate versus 3 automated players (`bots'), whereas a supervised player trained on a dialogue corpus in this setting achieved only 27%, versus the same 3 bots. This result supports the claim that DRL is a promising framework for training dialogue systems, and strategic agents with negotiation abilities.

Strategic Dialogue Management via Deep Reinforcement Learning

TL;DR

The paper tackles how to endow AI agents with strategic negotiation abilities in dialogue by applying deep Q-learning to the game of Settlers of Catan. By learning both offers and responses within constrained, legal action sets and using a high-dimensional state representation, the authors demonstrate that DRL agents outperform random, heuristic, and supervised baselines, achieving notable win rates and more trading activity. The work shows DRL as a promising framework for training complex, negotiation-capable dialogue systems and lays groundwork for future language-enabled strategic agents. Overall, the study provides a rigorous evaluation in a realistic multi-agent environment and highlights the benefits of learning from constrained actions and environment-specific feedback.

Abstract

Artificially intelligent agents equipped with strategic skills that can negotiate during their interactions with other natural or artificial agents are still underdeveloped. This paper describes a successful application of Deep Reinforcement Learning (DRL) for training intelligent agents with strategic conversational skills, in a situated dialogue setting. Previous studies have modelled the behaviour of strategic agents using supervised learning and traditional reinforcement learning techniques, the latter using tabular representations or learning with linear function approximation. In this study, we apply DRL with a high-dimensional state space to the strategic board game of Settlers of Catan---where players can offer resources in exchange for others and they can also reply to offers made by other players. Our experimental results report that the DRL-based learnt policies significantly outperformed several baselines including random, rule-based, and supervised-based behaviours. The DRL-based policy has a 53% win rate versus 3 automated players (`bots'), whereas a supervised player trained on a dialogue corpus in this setting achieved only 27%, versus the same 3 bots. This result supports the claim that DRL is a promising framework for training dialogue systems, and strategic agents with negotiation abilities.

Paper Structure

This paper contains 12 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Integrated system of the Deep Reinforcement Learning (DRL) agent for strategic interaction. (left) GUI of the board game "Settlers of Catan" ThomasH02. (right) Multilayer neural network of the DRL agent--see text for details.
  • Figure 2: Learning curves of Deep Reinforcement Learners (DRLs) against random, heuristic and supervised opponents. It can be observed that DRL agents can learn from different types of opponents---even from randomly behaving ones.