Chain-structured neural architecture search for financial time series forecasting

Denis Levchenko; Efstratios Rappos; Shabnam Ataee; Biagio Nigro; Stephan Robert-Nicoud

Chain-structured neural architecture search for financial time series forecasting

Denis Levchenko, Efstratios Rappos, Shabnam Ataee, Biagio Nigro, Stephan Robert-Nicoud

TL;DR

This work investigates neural architecture search for financial time series forecasting within chain-structured spaces, evaluating Bayesian optimization, reinforcement learning, and Hyperband across FFNN, 1D CNN, RNN, and TFT baselines. The study finds that Bayesian optimization and Hyperband generally outperform the RL approach, with RNNs and 1D CNNs often delivering the strongest predictive performance, though results vary significantly across datasets due to the data's non-stationarity and noise. Data scarcity and high variance in financial markets limit the gains from NAS alone, prompting recommendations for robust preprocessing (e.g., feature dimensionality reduction, PCA) and ensemble strategies. The results offer practical guidance on selecting NAS strategies and architectures for small, real-world time-series datasets and suggest directions for extending NAS to more scalable, structured search spaces and one-shot methods in finance.

Abstract

Neural architecture search (NAS) emerged as a way to automatically optimize neural networks for a specific task and dataset. Despite an abundance of research on NAS for images and natural language applications, similar studies for time series data are lacking. Among NAS search spaces, chain-structured are the simplest and most applicable to small datasets like time series. We compare three popular NAS strategies on chain-structured search spaces: Bayesian optimization (specifically Tree-structured Parzen Estimator), the hyperband method, and reinforcement learning in the context of financial time series forecasting. These strategies were employed to optimize simple well-understood neural architectures like the MLP, 1D CNN, and RNN, with more complex temporal fusion transformers (TFT) and their own optimizers included for comparison. We find Bayesian optimization and the hyperband method performing best among the strategies, and RNN and 1D CNN best among the architectures, but all methods were very close to each other with a high variance due to the difficulty of working with financial datasets. We discuss our approach to overcome the variance and provide implementation recommendations for future users and researchers.

Chain-structured neural architecture search for financial time series forecasting

TL;DR

Abstract

Paper Structure (21 sections, 4 figures, 3 algorithms)

This paper contains 21 sections, 4 figures, 3 algorithms.

Introduction
Data and problem formulation
Architecture types and their search spaces
Feedforward networks
Convolutional neural networks
Recurrent neural networks
Temporal fusion transformer
Search strategies
Bayesian optimization
Reinforcement learning approach
Hyperband
Methodology
Data preprocessing
Principal component analysis
Metrics
...and 6 more sections

Figures (4)

Figure 1: Scree plot for the Japan training dataset after time-derived features were removed. The curve saturates around $150$ components, suggesting that $k=150$ is a good choice to keep most of the information while reducing the dimensionality significantly.
Figure 2: Best performing architectures selected by each search strategy on each dataset. Every point represents average $AUC$ score and standard deviation on test data after retraining the selected architecture $50$ times. Type of the neural network chosen by the search strategy is displayed above each point; search strategies are color-coded.
Figure 3: Bayesian optimization history for the ordinary 1D CNN architecture on the US dataset. Each point represents average $AUC$ score on the validation dataset over $15$ runs for the same network configuration.
Figure 4: Slice plot for Bayesian optimization of LSTMs on the Japan dataset. Each point represents average $AUC$ score on the validation dataset over $15$ runs for the same network configuration. It is clear that setting chunk_length (i.e. the length in time of a sequence passed to the LSTM for individual prediction) to $10$ gives the best results, while the other parameters are less impactful.

Chain-structured neural architecture search for financial time series forecasting

TL;DR

Abstract

Chain-structured neural architecture search for financial time series forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (4)