Table of Contents
Fetching ...

Recurrent Reinforcement Learning: A Hybrid Approach

Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, Ji He

TL;DR

The paper tackles learning effective control policies in partially observable CRM settings by proposing a hybrid model that jointly trains a supervised RNN/LSTM for hidden-state representation with a deep Q-network. This SL-RNN/LSTM provides rich history-aware state representations, while the RL-DQN component optimizes long-term rewards, with gradients coordinating through interleaved updates. Extensive simulator-based experiments on a KDD Cup direct mailing CRM dataset show the hybrid models outperform purely supervised or reinforcement-learning baselines, and that joint training yields further gains. The work demonstrates the practical viability of deep RL with learned state representations for non-Markovian, real-world sequential decision problems like CRM.

Abstract

Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent's entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods.

Recurrent Reinforcement Learning: A Hybrid Approach

TL;DR

The paper tackles learning effective control policies in partially observable CRM settings by proposing a hybrid model that jointly trains a supervised RNN/LSTM for hidden-state representation with a deep Q-network. This SL-RNN/LSTM provides rich history-aware state representations, while the RL-DQN component optimizes long-term rewards, with gradients coordinating through interleaved updates. Extensive simulator-based experiments on a KDD Cup direct mailing CRM dataset show the hybrid models outperform purely supervised or reinforcement-learning baselines, and that joint training yields further gains. The work demonstrates the practical viability of deep RL with learned state representations for non-Markovian, real-world sequential decision problems like CRM.

Abstract

Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent's entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods.

Paper Structure

This paper contains 18 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Supervised RNN + Reinforced DQN: $o_t$ is the observation, $\tilde{h}_t$ is the hidden state for RNN, $o_{t+1}'$ is the predicted observation for time $t+1$, $R_t$ is the predicted reward, $Q(s, a)_t$ is the predicted Q-value at time $t$. The blue parts correspond to an unfolded RNN for SL, and the red parts for DQN. In this hybrid model: the input of DQN is the hidden layers of the supervised RNN model.
  • Figure 2: Learning curve for RL models
  • Figure 3: Supervised Learning and Reinforcement Learning under RNN Simulator with different simulation data (U, M, R). Each group has eight models: three SL models and five RL models.
  • Figure 4: Supervised Learning and Reinforcement Learning under RNN Simulator with different data size ($50K$, $100K$, $200K$, and $500K$). Each group has eight models: three SL models and five RL models.
  • Figure 5: An unfolded supervised learning RNN: $o_t$ is the observation, $\tilde{h}_t$ is the hidden state for RNN, $R(s, a)_t$ is the predicted reward at time $t$, where $s$ is the $\tilde{h}_t$ of RNN.
  • ...and 1 more figures