Table of Contents
Fetching ...

State estimator design using Jordan based long short-term memory networks

Avneet Kaur, Kirsten Morris

TL;DR

The paper addresses nonlinear state estimation by combining model-based insight with data-driven learning, introducing Jordan-based long short-term memory networks (JLSTM) as a faster-training alternative to ELSTM for estimating system states from noisy observations. It formalizes both ERN/JRN and ELSTM/JLSTM architectures, proves universal approximation capabilities for JRNs in state estimation, and provides implementation details and training protocols. Through three numerical experiments (one linear, two nonlinear), JLSTM and ELSTM outperform traditional EKF in nonlinear settings and show NMSE comparable to KF in linear cases, with JLSTM achieving similar accuracy to ELSTM at significantly reduced training times. The results suggest JLSTM as a practical, scalable approach for nonlinear state estimation, with potential stability advantages to explore in future work.

Abstract

State estimation of a dynamical system refers to estimating the state of a system given an imperfect model, noisy measurements and some or no information about the initial state. While Kalman filtering is optimal for estimation of linear systems with Gaussian noises, calculation of optimal estimators for nonlinear systems is challenging. We focus on establishing a pathway to optimal estimation of high-order systems by using recurrent connections motivated by Jordan recurrent neural networks(JRNs). The results are compared to the corresponding Elman structure based long short-term memory network(ELSTM) and the KF for linear and EKF for nonlinear systems. The results suggest that for nonlinear systems, the use of long short-term memory networks can improve estimation error and also computation time. Also, the Jordan based long short-term memory networks(JLSTMs) require less training to achieve performance similar to ELSTMs.

State estimator design using Jordan based long short-term memory networks

TL;DR

The paper addresses nonlinear state estimation by combining model-based insight with data-driven learning, introducing Jordan-based long short-term memory networks (JLSTM) as a faster-training alternative to ELSTM for estimating system states from noisy observations. It formalizes both ERN/JRN and ELSTM/JLSTM architectures, proves universal approximation capabilities for JRNs in state estimation, and provides implementation details and training protocols. Through three numerical experiments (one linear, two nonlinear), JLSTM and ELSTM outperform traditional EKF in nonlinear settings and show NMSE comparable to KF in linear cases, with JLSTM achieving similar accuracy to ELSTM at significantly reduced training times. The results suggest JLSTM as a practical, scalable approach for nonlinear state estimation, with potential stability advantages to explore in future work.

Abstract

State estimation of a dynamical system refers to estimating the state of a system given an imperfect model, noisy measurements and some or no information about the initial state. While Kalman filtering is optimal for estimation of linear systems with Gaussian noises, calculation of optimal estimators for nonlinear systems is challenging. We focus on establishing a pathway to optimal estimation of high-order systems by using recurrent connections motivated by Jordan recurrent neural networks(JRNs). The results are compared to the corresponding Elman structure based long short-term memory network(ELSTM) and the KF for linear and EKF for nonlinear systems. The results suggest that for nonlinear systems, the use of long short-term memory networks can improve estimation error and also computation time. Also, the Jordan based long short-term memory networks(JLSTMs) require less training to achieve performance similar to ELSTMs.

Paper Structure

This paper contains 10 sections, 3 theorems, 21 equations, 10 figures, 4 tables.

Key Result

Theorem 2.8

hornik1989multilayer (Universal approximation theorem for feed-forward neural networks) For any activation function $\sigma:\mathbb{R} \rightarrow [0,1],$ any dimension $n$ and any probability measure $\mu$ on $(\mathbb{R}^{n},\mathbb{B}^{n}), \Sigma^{n}(\sigma)$ is uniformly dense on a compact doma

Figures (10)

  • Figure 1: The structure of an Elman recurrent neural network(ERN) for state estimation. It uses hidden to hidden recurrent connections. The symbols $y^{(t)}, a^{(t)}, x^{(t)}$ and $\hat{x}^{(t)}$ represent input measurement vector, hidden layer vector, true state vector and estimated state vector respectively at time $t$. The cost function $J$ is considered to be mean squared error(MSE). The weights and biases are represented by $W_{ay}, W_{aa}, W_{xa}, b_a$ and $b_x.$
  • Figure 2: The structure of the Jordan recurrent network(JRN) proposed for state estimation. It uses output-to-hidden recurrent connections similar to the filter's dynamical system. The symbols $y^{(t)}, a^{(t)}, x^{(t)}$ and $\hat{x}^{(t)}$ represent input measurement vector, hidden layer vector, true state vector and estimated state vector respectively at time $t$. Cost function $J^{(t)}$ is the mean squared error at time $t$. The weights and biases of the network are represented by $W_{ay}, W_{ax}, W_{xa}, b_x$ and $b_y.$
  • Figure 3: Structure of an Elman long short term memory network(ELSTM) for state estimation. It uses hidden to hidden recurrent connections for state estimation of a nonlinear dynamical system. The symbols $y^{(t)}, a^{(t)}$ and $\hat{x}^{(t)}$ represent input measurement vector, hidden layer vector and estimated state vector respectively.The forget, input and output gates are represented by $f^{(t)}, i^{(t)}$ and $o^{(t)}.$ The cell state vector and the cell input activation vector are represented by $c^{(t)}$ and $\tilde{c}^{(t)}$ respectively. The weights and biases are represented by $W_{fy}, W_{fa}, W_{iy}, W_{ia}, W_{oy}, W_{oa}, W_{\tilde{c}y}, W_{\tilde{c}a}, W_{xa}, b_f, b_i, b_o$ and $b_{\tilde{c}}.$
  • Figure 4: Structure of a Jordan long short term memory network(JLSTM) for state estimation. It uses output to hidden recurrent connections. The symbols $y^{(t)}, a^{(t)}$ and $\hat{x}^{(t)}$ represent input measurement vector, hidden layer vector and estimated state vector respectively. The forget, input and output gates are represented by $f^{(t)}, i^{(t)}$ and $o^{(t)}.$ The cell state vector and the cell input activation vector are represented by $c^{(t)}$ and $\tilde{c}^{(t)}$ respectively. The weights and biases are represented by $W_{fy}, W_{fx}, W_{iy}, W_{ix}, W_{oy}, W_{ox}, W_{\tilde{c}y}, W_{\tilde{c}x}, W_{xa}, b_f, b_i, b_o$ and $b_{\tilde{c}}.$
  • Figure 5: This figure compares the performance of KF, JLSTM and ELSTM using the average errors over $10$ test sequences for $50$ seconds of $10$ connected springs with a noisy Gaussian initial condition.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Definition 2.1: Activation function
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Definition 2.6
  • Definition 2.7
  • Theorem 2.8
  • Corollary 2.9
  • Definition 2.10
  • ...and 1 more