Regularising NARX models with multi-task learning

Sarah Bee; Lawrence Bull; Nikolaos Dervilis; Keith Worden

Regularising NARX models with multi-task learning

Sarah Bee, Lawrence Bull, Nikolaos Dervilis, Keith Worden

TL;DR

The paper tackles overfitting and poor generalisation in NARX models for time-series prediction by introducing MT-NARX, which jointly predicts the current output and lead-time outputs to regularise learning. Using a Duffing oscillator as a nonlinear benchmark, the authors find that MT-NARX can reduce NMSE under high noise (e.g., $NMSE$ of $6.0\%$ for MT-NARX vs $7.3\%$ for ST-NARX at $100\%$ noise), though gains are limited when input noise is present and optimization becomes sensitive to initialization. They analyse the training dynamics, show variability due to random seeds, and propose loss-weighting strategies to bias the model toward the primary target output. The work demonstrates that non-operational lead features in MT-NARX can provide regularisation benefits for structural time-series problems, pointing to loss-design improvements as a key path toward stronger generalisation.

Abstract

A Nonlinear Auto-Regressive with eXogenous inputs (NARX) model can be used to describe time-varying processes; where the output depends on both previous outputs and current/previous external input variables. One limitation of NARX models is their propensity to overfit and result in poor generalisation for future predictions. The proposed method to help to overcome the issue of overfitting is a NARX model which predicts outputs at both the current time and several lead times into the future. This is a form of multi-task learner (MTL); whereby the lead time outputs will regularise the current time output. This work shows that for high noise level, MTL can be used to regularise NARX with a lower Normalised Mean Square Error (NMSE) compared to the NMSE of the independent learner counterpart.

Regularising NARX models with multi-task learning

TL;DR

for MT-NARX vs

for ST-NARX at

noise), though gains are limited when input noise is present and optimization becomes sensitive to initialization. They analyse the training dynamics, show variability due to random seeds, and propose loss-weighting strategies to bias the model toward the primary target output. The work demonstrates that non-operational lead features in MT-NARX can provide regularisation benefits for structural time-series problems, pointing to loss-design improvements as a key path toward stronger generalisation.

Abstract

Paper Structure (9 sections, 4 equations, 6 figures, 2 tables)

This paper contains 9 sections, 4 equations, 6 figures, 2 tables.

Introduction
Development of case study
Base dataset
Representation of noise
NN development
Performance metrics
Hyper-parameter training
Results
Discussion

Figures (6)

Figure 1: Example of a standard NARX neural network, left, and example of a MT-NARX neural network, right.
Figure 2: SDOF Duffing oscillator with linear and cubic stiffness $k$ and $k_3$, respectively, mass $m$ and damping coefficient $c$.
Figure 3: The system response over time with no noise (pink), 10% noise (minor noise, blue), and high noise (green).
Figure 4: Optimal model for ST-NARX (green) and MT-NARX (blue) for different noise-cases alongside the solution with no noise (pink) for trial 1
Figure 5: NMSE for ST-NARX (green) and MT-NARX (blue) for different noise levels (1%, 10%, 30%, 50% and 100%) for the three trials.
...and 1 more figures

Regularising NARX models with multi-task learning

TL;DR

Abstract

Regularising NARX models with multi-task learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)