Chain-structured neural architecture search for financial time series forecasting
Denis Levchenko, Efstratios Rappos, Shabnam Ataee, Biagio Nigro, Stephan Robert-Nicoud
TL;DR
This work investigates neural architecture search for financial time series forecasting within chain-structured spaces, evaluating Bayesian optimization, reinforcement learning, and Hyperband across FFNN, 1D CNN, RNN, and TFT baselines. The study finds that Bayesian optimization and Hyperband generally outperform the RL approach, with RNNs and 1D CNNs often delivering the strongest predictive performance, though results vary significantly across datasets due to the data's non-stationarity and noise. Data scarcity and high variance in financial markets limit the gains from NAS alone, prompting recommendations for robust preprocessing (e.g., feature dimensionality reduction, PCA) and ensemble strategies. The results offer practical guidance on selecting NAS strategies and architectures for small, real-world time-series datasets and suggest directions for extending NAS to more scalable, structured search spaces and one-shot methods in finance.
Abstract
Neural architecture search (NAS) emerged as a way to automatically optimize neural networks for a specific task and dataset. Despite an abundance of research on NAS for images and natural language applications, similar studies for time series data are lacking. Among NAS search spaces, chain-structured are the simplest and most applicable to small datasets like time series. We compare three popular NAS strategies on chain-structured search spaces: Bayesian optimization (specifically Tree-structured Parzen Estimator), the hyperband method, and reinforcement learning in the context of financial time series forecasting. These strategies were employed to optimize simple well-understood neural architectures like the MLP, 1D CNN, and RNN, with more complex temporal fusion transformers (TFT) and their own optimizers included for comparison. We find Bayesian optimization and the hyperband method performing best among the strategies, and RNN and 1D CNN best among the architectures, but all methods were very close to each other with a high variance due to the difficulty of working with financial datasets. We discuss our approach to overcome the variance and provide implementation recommendations for future users and researchers.
