Table of Contents
Fetching ...

Chain-structured neural architecture search for financial time series forecasting

Denis Levchenko, Efstratios Rappos, Shabnam Ataee, Biagio Nigro, Stephan Robert-Nicoud

TL;DR

This work investigates neural architecture search for financial time series forecasting within chain-structured spaces, evaluating Bayesian optimization, reinforcement learning, and Hyperband across FFNN, 1D CNN, RNN, and TFT baselines. The study finds that Bayesian optimization and Hyperband generally outperform the RL approach, with RNNs and 1D CNNs often delivering the strongest predictive performance, though results vary significantly across datasets due to the data's non-stationarity and noise. Data scarcity and high variance in financial markets limit the gains from NAS alone, prompting recommendations for robust preprocessing (e.g., feature dimensionality reduction, PCA) and ensemble strategies. The results offer practical guidance on selecting NAS strategies and architectures for small, real-world time-series datasets and suggest directions for extending NAS to more scalable, structured search spaces and one-shot methods in finance.

Abstract

Neural architecture search (NAS) emerged as a way to automatically optimize neural networks for a specific task and dataset. Despite an abundance of research on NAS for images and natural language applications, similar studies for time series data are lacking. Among NAS search spaces, chain-structured are the simplest and most applicable to small datasets like time series. We compare three popular NAS strategies on chain-structured search spaces: Bayesian optimization (specifically Tree-structured Parzen Estimator), the hyperband method, and reinforcement learning in the context of financial time series forecasting. These strategies were employed to optimize simple well-understood neural architectures like the MLP, 1D CNN, and RNN, with more complex temporal fusion transformers (TFT) and their own optimizers included for comparison. We find Bayesian optimization and the hyperband method performing best among the strategies, and RNN and 1D CNN best among the architectures, but all methods were very close to each other with a high variance due to the difficulty of working with financial datasets. We discuss our approach to overcome the variance and provide implementation recommendations for future users and researchers.

Chain-structured neural architecture search for financial time series forecasting

TL;DR

This work investigates neural architecture search for financial time series forecasting within chain-structured spaces, evaluating Bayesian optimization, reinforcement learning, and Hyperband across FFNN, 1D CNN, RNN, and TFT baselines. The study finds that Bayesian optimization and Hyperband generally outperform the RL approach, with RNNs and 1D CNNs often delivering the strongest predictive performance, though results vary significantly across datasets due to the data's non-stationarity and noise. Data scarcity and high variance in financial markets limit the gains from NAS alone, prompting recommendations for robust preprocessing (e.g., feature dimensionality reduction, PCA) and ensemble strategies. The results offer practical guidance on selecting NAS strategies and architectures for small, real-world time-series datasets and suggest directions for extending NAS to more scalable, structured search spaces and one-shot methods in finance.

Abstract

Neural architecture search (NAS) emerged as a way to automatically optimize neural networks for a specific task and dataset. Despite an abundance of research on NAS for images and natural language applications, similar studies for time series data are lacking. Among NAS search spaces, chain-structured are the simplest and most applicable to small datasets like time series. We compare three popular NAS strategies on chain-structured search spaces: Bayesian optimization (specifically Tree-structured Parzen Estimator), the hyperband method, and reinforcement learning in the context of financial time series forecasting. These strategies were employed to optimize simple well-understood neural architectures like the MLP, 1D CNN, and RNN, with more complex temporal fusion transformers (TFT) and their own optimizers included for comparison. We find Bayesian optimization and the hyperband method performing best among the strategies, and RNN and 1D CNN best among the architectures, but all methods were very close to each other with a high variance due to the difficulty of working with financial datasets. We discuss our approach to overcome the variance and provide implementation recommendations for future users and researchers.
Paper Structure (21 sections, 4 figures, 3 algorithms)

This paper contains 21 sections, 4 figures, 3 algorithms.

Figures (4)

  • Figure 1: Scree plot for the Japan training dataset after time-derived features were removed. The curve saturates around $150$ components, suggesting that $k=150$ is a good choice to keep most of the information while reducing the dimensionality significantly.
  • Figure 2: Best performing architectures selected by each search strategy on each dataset. Every point represents average $AUC$ score and standard deviation on test data after retraining the selected architecture $50$ times. Type of the neural network chosen by the search strategy is displayed above each point; search strategies are color-coded.
  • Figure 3: Bayesian optimization history for the ordinary 1D CNN architecture on the US dataset. Each point represents average $AUC$ score on the validation dataset over $15$ runs for the same network configuration.
  • Figure 4: Slice plot for Bayesian optimization of LSTMs on the Japan dataset. Each point represents average $AUC$ score on the validation dataset over $15$ runs for the same network configuration. It is clear that setting chunk_length (i.e. the length in time of a sequence passed to the LSTM for individual prediction) to $10$ gives the best results, while the other parameters are less impactful.