Evaluating the Sensitivity of BiLSTM Forecasting Models to Sequence Length and Input Noise
Salma Albelali, Moataz Ahmed
TL;DR
This study systematically examines how BiLSTM time-series forecasting is affected by two data-centric factors: input sequence length and additive Gaussian noise, using a modular, reproducible forecasting pipeline evaluated across three weather datasets with different sampling frequencies. Through controlled experiments that consider baseline short sequences, extended sequences, noise injection, and their interaction, the work reveals that longer sequences increase overfitting and data leakage risk in data-scarce settings, while additive noise consistently degrades predictive accuracy; the combination of both factors produces the most substantial loss in robustness, though high-frequency data show greater resilience. The findings highlight the need for dataset-aware design and testing practices in DL-based forecasting, and they propose future directions including adaptive sequence configuration, preprocessing interactions, multivariate and irregular data handling, and explainable AI integration to improve reliability in critical applications.
Abstract
Deep learning (DL) models, a specialized class of multilayer neural networks, have become central to time-series forecasting in critical domains such as environmental monitoring and the Internet of Things (IoT). Among these, Bidirectional Long Short-Term Memory (BiLSTM) architectures are particularly effective in capturing complex temporal dependencies. However, the robustness and generalization of such models are highly sensitive to input data characteristics - an aspect that remains underexplored in existing literature. This study presents a systematic empirical analysis of two key data-centric factors: input sequence length and additive noise. To support this investigation, a modular and reproducible forecasting pipeline is developed, incorporating standardized preprocessing, sequence generation, model training, validation, and evaluation. Controlled experiments are conducted on three real-world datasets with varying sampling frequencies to assess BiLSTM performance under different input conditions. The results yield three key findings: (1) longer input sequences significantly increase the risk of overfitting and data leakage, particularly in data-constrained environments; (2) additive noise consistently degrades predictive accuracy across sampling frequencies; and (3) the simultaneous presence of both factors results in the most substantial decline in model stability. While datasets with higher observation frequencies exhibit greater robustness, they remain vulnerable when both input challenges are present. These findings highlight important limitations in current DL-based forecasting pipelines and underscore the need for data-aware design strategies. This work contributes to a deeper understanding of DL model behavior in dynamic time-series environments and provides practical insights for developing more reliable and generalizable forecasting systems.
