Towards Goal-Oriented Agents for Evolving Problems Observed via Conversation

Michael Free; Andrew Langworthy; Mary Dimitropoulaki; Simon Thompson

Towards Goal-Oriented Agents for Evolving Problems Observed via Conversation

Michael Free, Andrew Langworthy, Mary Dimitropoulaki, Simon Thompson

TL;DR

This work tackles solving evolving problems via goal-directed dialogue, where the agent observes the environment only through a conversational intermediary. It proposes a three-component system consisting of a gridsworld environment, a simulated MAC-based user, and a double-DQN–driven RL agent that uses BERT-encoded utterances and an LSTM history module to select from 14 possible agent utterances. Through a large, procedurally generated dataset and curriculum learning, the study demonstrates that the LSTM-based agent can achieve high task completion rates (around 84%) and learn efficient questioning and moving strategies, while curriculum learning reduces the required training data by about 40%. The results highlight both the potential and limitations of transferring such conversational agents to real problems, suggesting future work on more diverse simulated users and larger, more realistic environments to improve robustness and applicability.

Abstract

The objective of this work is to train a chatbot capable of solving evolving problems through conversing with a user about a problem the chatbot cannot directly observe. The system consists of a virtual problem (in this case a simple game), a simulated user capable of answering natural language questions that can observe and perform actions on the problem, and a Deep Q-Network (DQN)-based chatbot architecture. The chatbot is trained with the goal of solving the problem through dialogue with the simulated user using reinforcement learning. The contributions of this paper are as follows: a proposed architecture to apply a conversational DQN-based agent to evolving problems, an exploration of training methods such as curriculum learning on model performance and the effect of modified reward functions in the case of increasing environment complexity.

Towards Goal-Oriented Agents for Evolving Problems Observed via Conversation

TL;DR

Abstract

Towards Goal-Oriented Agents for Evolving Problems Observed via Conversation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)