Table of Contents
Fetching ...

Robot See, Robot Do: Imitation Reward for Noisy Financial Environments

Sven Goluža, Tomislav Kovačević, Stjepan Begušić, Zvonko Kostanjčar

TL;DR

This work integrates imitation (expert’s) feedback with reinforcement (agent’s) feedback in a model-free RL algorithm, effectively embedding the imitation learning problem within the RL paradigm to handle the stochasticity of reward signals.

Abstract

The sequential nature of decision-making in financial asset trading aligns naturally with the reinforcement learning (RL) framework, making RL a common approach in this domain. However, the low signal-to-noise ratio in financial markets results in noisy estimates of environment components, including the reward function, which hinders effective policy learning by RL agents. Given the critical importance of reward function design in RL problems, this paper introduces a novel and more robust reward function by leveraging imitation learning, where a trend labeling algorithm acts as an expert. We integrate imitation (expert's) feedback with reinforcement (agent's) feedback in a model-free RL algorithm, effectively embedding the imitation learning problem within the RL paradigm to handle the stochasticity of reward signals. Empirical results demonstrate that this novel approach improves financial performance metrics compared to traditional benchmarks and RL agents trained solely using reinforcement feedback.

Robot See, Robot Do: Imitation Reward for Noisy Financial Environments

TL;DR

This work integrates imitation (expert’s) feedback with reinforcement (agent’s) feedback in a model-free RL algorithm, effectively embedding the imitation learning problem within the RL paradigm to handle the stochasticity of reward signals.

Abstract

The sequential nature of decision-making in financial asset trading aligns naturally with the reinforcement learning (RL) framework, making RL a common approach in this domain. However, the low signal-to-noise ratio in financial markets results in noisy estimates of environment components, including the reward function, which hinders effective policy learning by RL agents. Given the critical importance of reward function design in RL problems, this paper introduces a novel and more robust reward function by leveraging imitation learning, where a trend labeling algorithm acts as an expert. We integrate imitation (expert's) feedback with reinforcement (agent's) feedback in a model-free RL algorithm, effectively embedding the imitation learning problem within the RL paradigm to handle the stochasticity of reward signals. Empirical results demonstrate that this novel approach improves financial performance metrics compared to traditional benchmarks and RL agents trained solely using reinforcement feedback.

Paper Structure

This paper contains 15 sections, 8 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Two positions with $\Delta t_1$ and $\Delta t_2$ holding periods yielding $r_1$ and $r_2$ returns, respectively, as a result of two labeled uptrends.
  • Figure 2: Four different sets of labels obtained using the oracle labeling algorithm, each considering a different level of commission costs, expressed in basis points (bps).
  • Figure 3: Proposed reward signal ($r^{RIF}$) compared to the reinforcement feedback ($r^{RF}$) over $100{,}000$ time steps of random policy evaluation on asset ES. The expert commission is set to $\vartheta = 3$ bps, while the trading commission in the experiment is set to $\phi=3$ bps.