Regularising NARX models with multi-task learning
Sarah Bee, Lawrence Bull, Nikolaos Dervilis, Keith Worden
TL;DR
The paper tackles overfitting and poor generalisation in NARX models for time-series prediction by introducing MT-NARX, which jointly predicts the current output and lead-time outputs to regularise learning. Using a Duffing oscillator as a nonlinear benchmark, the authors find that MT-NARX can reduce NMSE under high noise (e.g., $NMSE$ of $6.0\%$ for MT-NARX vs $7.3\%$ for ST-NARX at $100\%$ noise), though gains are limited when input noise is present and optimization becomes sensitive to initialization. They analyse the training dynamics, show variability due to random seeds, and propose loss-weighting strategies to bias the model toward the primary target output. The work demonstrates that non-operational lead features in MT-NARX can provide regularisation benefits for structural time-series problems, pointing to loss-design improvements as a key path toward stronger generalisation.
Abstract
A Nonlinear Auto-Regressive with eXogenous inputs (NARX) model can be used to describe time-varying processes; where the output depends on both previous outputs and current/previous external input variables. One limitation of NARX models is their propensity to overfit and result in poor generalisation for future predictions. The proposed method to help to overcome the issue of overfitting is a NARX model which predicts outputs at both the current time and several lead times into the future. This is a form of multi-task learner (MTL); whereby the lead time outputs will regularise the current time output. This work shows that for high noise level, MTL can be used to regularise NARX with a lower Normalised Mean Square Error (NMSE) compared to the NMSE of the independent learner counterpart.
