Simple Feedfoward Neural Networks are Almost All You Need for Time Series Forecasting
Fan-Keng Sun, Yu-Cheng Wu, Duane S. Boning
TL;DR
This work argues that simple feedforward neural networks, when equipped with a few carefully chosen enhancements, can rival state-of-the-art time series forecasting models such as Transformers. The proposed SFNN architecture uses a shared univariate core across series, with optional modules for input mean centering, series-wise non-linear mapping, and layer normalization to boost performance. Through extensive experiments and a rigorous ablation study, the authors show that SFNNs achieve state-of-the-art results on many datasets, justify their robustness by longer-look-back gains, and reveal limitations on certain domains like Traffic. They also critique current benchmarking practices and propose a fair evaluation protocol, establishing SFNNs as a strong baseline that future work should rigorously compare against.
Abstract
Time series data are everywhere -- from finance to healthcare -- and each domain brings its own unique complexities and structures. While advanced models like Transformers and graph neural networks (GNNs) have gained popularity in time series forecasting, largely due to their success in tasks like language modeling, their added complexity is not always necessary. In our work, we show that simple feedforward neural networks (SFNNs) can achieve performance on par with, or even exceeding, these state-of-the-art models, while being simpler, smaller, faster, and more robust. Our analysis indicates that, in many cases, univariate SFNNs are sufficient, implying that modeling interactions between multiple series may offer only marginal benefits. Even when inter-series relationships are strong, a basic multivariate SFNN still delivers competitive results. We also examine some key design choices and offer guidelines on making informed decisions. Additionally, we critique existing benchmarking practices and propose an improved evaluation protocol. Although SFNNs may not be optimal for every situation (hence the ``almost'' in our title) they serve as a strong baseline that future time series forecasting methods should always be compared against.
