Table of Contents
Fetching ...

Reinforcement Learning for Financial Index Tracking

Xianhua Peng, Chenyin Gong, Xue Dong He

TL;DR

The RL method resolves the issue of data limitation resulting from the availability of a single sample path of financial data by a novel training scheme and outperforms a benchmark method in terms of tracking accuracy and has the potential for earning extra profit through cash withdraw strategy.

Abstract

We propose the first discrete-time infinite-horizon dynamic formulation of the financial index tracking problem under both return-based tracking error and value-based tracking error. The formulation overcomes the limitations of existing models by incorporating the intertemporal dynamics of market information variables not limited to prices, allowing exact calculation of transaction costs, accounting for the tradeoff between overall tracking error and transaction costs, allowing effective use of data in a long time period, etc. The formulation also allows novel decision variables of cash injection or withdraw. We propose to solve the portfolio rebalancing equation using a Banach fixed point iteration, which allows to accurately calculate the transaction costs specified as nonlinear functions of trading volumes in practice. We propose an extension of deep reinforcement learning (RL) method to solve the dynamic formulation. Our RL method resolves the issue of data limitation resulting from the availability of a single sample path of financial data by a novel training scheme. A comprehensive empirical study based on a 17-year-long testing set demonstrates that the proposed method outperforms a benchmark method in terms of tracking accuracy and has the potential for earning extra profit through cash withdraw strategy.

Reinforcement Learning for Financial Index Tracking

TL;DR

The RL method resolves the issue of data limitation resulting from the availability of a single sample path of financial data by a novel training scheme and outperforms a benchmark method in terms of tracking accuracy and has the potential for earning extra profit through cash withdraw strategy.

Abstract

We propose the first discrete-time infinite-horizon dynamic formulation of the financial index tracking problem under both return-based tracking error and value-based tracking error. The formulation overcomes the limitations of existing models by incorporating the intertemporal dynamics of market information variables not limited to prices, allowing exact calculation of transaction costs, accounting for the tradeoff between overall tracking error and transaction costs, allowing effective use of data in a long time period, etc. The formulation also allows novel decision variables of cash injection or withdraw. We propose to solve the portfolio rebalancing equation using a Banach fixed point iteration, which allows to accurately calculate the transaction costs specified as nonlinear functions of trading volumes in practice. We propose an extension of deep reinforcement learning (RL) method to solve the dynamic formulation. Our RL method resolves the issue of data limitation resulting from the availability of a single sample path of financial data by a novel training scheme. A comprehensive empirical study based on a 17-year-long testing set demonstrates that the proposed method outperforms a benchmark method in terms of tracking accuracy and has the potential for earning extra profit through cash withdraw strategy.
Paper Structure (37 sections, 9 theorems, 80 equations, 15 figures, 13 tables, 2 algorithms)

This paper contains 37 sections, 9 theorems, 80 equations, 15 figures, 13 tables, 2 algorithms.

Key Result

Proposition 3.1

($i$) If $V_{(t+\frac{k}{M})-} > 0$, $\sum_{i=1}^{N} \xi_{2i} | w_{i, (t+\frac{k}{M})-} | < 1$, and $\sum_{i=1}^{N} (\frac{\xi_{1i}}{p_{i, t+\frac{k}{M}}} + \xi_{2i})| w_{i, t+\frac{k}{M}} | < 1$, then the function on the left-hand side of Equation equ:V_fixed_point_prob is a contraction on $\mathbb

Figures (15)

  • Figure 1: Estimation of the advantage $\hat{A}_{t_0}$ in the case of $t_0 + n \le T_{\text{train}}$.
  • Figure 2: Estimation of the advantage $\hat{A}_{t_0}$ in the case of $t_0 + n > T_{\text{train}}$.
  • Figure 3: Rolling windows for training and testing.
  • Figure 4: Learning curves for the return-based tracking of S&P 500 index on the training window 01/02/1998-01/02/2018. The $x$-axis and the $y$-axis respectively represent the updating step and the loss defined in Equation \ref{['eq:total_loss']} in the left subfigure, and the training step and the cumulative reward in the right subfigure.
  • Figure 5: Out-of-sample return-based tracking error (R-TE) of the proposed RL method and the MM method from 2005 to 2021 for the S&P 500 index. The mean and standard error of the proposed RL method's R-TE across the testing years are 2.378E-03 and 3.050E-04, respectively. The mean and standard error of the MM method's R-TE across the testing years are 2.412E-03 and 3.001E-04, respectively.
  • ...and 10 more figures

Theorems & Definitions (22)

  • Proposition 3.1
  • proof
  • Remark 3.1
  • Remark 3.2
  • Remark 3.3
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3
  • ...and 12 more