Table of Contents
Fetching ...

Few-Shot Learning Patterns in Financial Time-Series for Trend-Following Strategies

Kieran Wood, Samuel Kessler, Stephen J. Roberts, Stefan Zohren

TL;DR

The authors propose the cross-attentive time-series trend network—X-Trend—which takes positions attending over a context set of financial time-series regimes and allows for interpretation of the relationship between forecasts and patterns in the context set.

Abstract

Forecasting models for systematic trading strategies do not adapt quickly when financial market conditions rapidly change, as was seen in the advent of the COVID-19 pandemic in 2020, causing many forecasting models to take loss-making positions. To deal with such situations, we propose a novel time-series trend-following forecaster that can quickly adapt to new market conditions, referred to as regimes. We leverage recent developments from the deep learning community and use few-shot learning. We propose the Cross Attentive Time-Series Trend Network -- X-Trend -- which takes positions attending over a context set of financial time-series regimes. X-Trend transfers trends from similar patterns in the context set to make forecasts, then subsequently takes positions for a new distinct target regime. By quickly adapting to new financial regimes, X-Trend increases Sharpe ratio by 18.9% over a neural forecaster and 10-fold over a conventional Time-series Momentum strategy during the turbulent market period from 2018 to 2023. Our strategy recovers twice as quickly from the COVID-19 drawdown compared to the neural-forecaster. X-Trend can also take zero-shot positions on novel unseen financial assets obtaining a 5-fold Sharpe ratio increase versus a neural time-series trend forecaster over the same period. Furthermore, the cross-attention mechanism allows us to interpret the relationship between forecasts and patterns in the context set.

Few-Shot Learning Patterns in Financial Time-Series for Trend-Following Strategies

TL;DR

The authors propose the cross-attentive time-series trend network—X-Trend—which takes positions attending over a context set of financial time-series regimes and allows for interpretation of the relationship between forecasts and patterns in the context set.

Abstract

Forecasting models for systematic trading strategies do not adapt quickly when financial market conditions rapidly change, as was seen in the advent of the COVID-19 pandemic in 2020, causing many forecasting models to take loss-making positions. To deal with such situations, we propose a novel time-series trend-following forecaster that can quickly adapt to new market conditions, referred to as regimes. We leverage recent developments from the deep learning community and use few-shot learning. We propose the Cross Attentive Time-Series Trend Network -- X-Trend -- which takes positions attending over a context set of financial time-series regimes. X-Trend transfers trends from similar patterns in the context set to make forecasts, then subsequently takes positions for a new distinct target regime. By quickly adapting to new financial regimes, X-Trend increases Sharpe ratio by 18.9% over a neural forecaster and 10-fold over a conventional Time-series Momentum strategy during the turbulent market period from 2018 to 2023. Our strategy recovers twice as quickly from the COVID-19 drawdown compared to the neural-forecaster. X-Trend can also take zero-shot positions on novel unseen financial assets obtaining a 5-fold Sharpe ratio increase versus a neural time-series trend forecaster over the same period. Furthermore, the cross-attention mechanism allows us to interpret the relationship between forecasts and patterns in the context set.
Paper Structure (25 sections, 22 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 22 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: An overview of the X-Trend few-shot learning trend-following model. 1) Each asset is segmented into regimes using a change-point detection algorithm. 2) The context set is constructed by randomly sampling regimes from different assets. The objective is to produce long/short positions given the target sequence while respecting causality with the context set. 3) Our X-Trend model uses a cross-attention layer for the target to leverage patterns in the context set. 4) The model produces a distribution over next-day returns. 5) The model outputs positions using Predictive probability density function To traded Position module (PTP). 6) We train our model by jointly optimizing a Sharpe ratio loss and negative log-likelihood.
  • Figure 2: An illustration of the different ways the target is able to attend to the context set. F, every hidden state in the target sequence is able to attend to the final hidden states of the contexts. T, the time equivalent hidden state in the target is able to attend to the corresponding hidden state in the contexts. C, every hidden state in the target sequence is able to attend to the final hidden state in the change-point segmented contexts. The dark arrows illustrate the context time-steps the $4$-th target time-step attends to.
  • Figure 3: A time-series segmented with change-point detection to create sequences for the context set. Different colours are different regimes. This example shows the British Pound Sterling continuous, ratio-adjusted, futures contract. Here, for illustrative purposes, regimes are segmented with a change-point threshold of $L_C / (L_M + L_C) \geq 0.99$, where $L_M$ is the likelihood of fitting a Gaussian Process characterized by a Matérn 3/2 kernel, and $L_C$ is another characterized by a Change-point kernel. Details of this procedure can be found in \ref{['app:cpd']}.
  • Figure 4: Encoder and decoder X-Trend-G model. The FFN, VSN, Self-Attention and Cross-Attention components are all applied element-wise to each time-step. Sequences in the context set are mapped to representations via $\Xi_{\text{key}}(\cdot, \cdot)$ and $\Xi_{\text{value}}(\cdot, \cdot)$. For the key inputs we exclude next-day returns and use $\mathbf{x}_{-l_c:t_c}^{(c)}$ instead of $\boldsymbol{\xi}_{-l_c:t_c}^{(c)}$. Contexts are then passed to the cross-attention as keys and values with a representation of the target sequence $\mathbf{x}_{t'}^{(i)}$ which we want to make forecasts as the query. It should be noted that we have a separate instance of keys $K_{t'}$ and values $V_{t'}$ for the query $q_{t'}$ at each time-step $t'\in (t-l_t+1:t)$, which we detail in \ref{['fig:context-hidden']}. The decoder then takes the target sequence and the output representation from the encoder, $\mathbf{y}_{t'}^{(i)}$. It outputs a position for the Sharpe stream and the forecast stream, which we label $(\mu_{t'}, \sigma_{t'})$, for the maximum likelihood. Side information $s^{(i)}$ regarding the target asset is also passed as input to the decoder for few-shot learning only, not zero-shot learning. If we are not using the joint loss function, we instead output for the Sharpe stream after the second last FFN.
  • Figure 5: Few-shot setting cumulative strategy returns (left) and drawdown plot (right), averaged across 10 full repeats and an additional portfolio volatility re-scaling step to 15% volatility. We only plot drawdown of the Base Learner and X-Trend-Q, the primary comparison, to reduce clutter.
  • ...and 6 more figures