Stability properties of Minimal Gated Unit neural networks

Stefano De Carli; Davide Previtali; Mirko Mazzoleni; Fabio Previdi

Stability properties of Minimal Gated Unit neural networks

Stefano De Carli, Davide Previtali, Mirko Mazzoleni, Fabio Previdi

Abstract

In this work, we address the need for efficient and formally stable Recurrent Neural Networks (RNNs) in environments with limited computational resources by analyzing the stability of the Minimal Gated Unit (MGU) network, a lightweight alternative to common gated RNNs used in system identification. We derive sufficient parametric conditions for the MGU network's input-to-state stability and incremental input-to-state stability properties. These conditions enable a-posteriori validation of model stability and form the basis for novel stability-promoting training methodologies, including a warm-start of the network's parameters and a projected gradient-based optimization scheme, both of which are presented in this work. Comparative evaluation, including robustness analysis and validation on synthetic and real-world data (i.e., the Silverbox benchmark), demonstrates that the minimal gated unit network successfully combines formal stability guarantees with superior parameter efficiency and faster inference times compared to other state-of-the-art recurrent neural networks, while maintaining comparable and satisfactory accuracy. Notably, the results attained on the Silverbox benchmark illustrate that the stable MGU network effectively captures the system dynamics, whereas other stable RNNs fail to converge to a reliable model.

Stability properties of Minimal Gated Unit neural networks

Abstract

Paper Structure (23 sections, 8 theorems, 44 equations, 6 figures, 1 table)

This paper contains 23 sections, 8 theorems, 44 equations, 6 figures, 1 table.

Introduction
Contributions
Paper organization
Preliminaries and problem statement
Notation and preliminaries
Problem statement
Minimal gated unit networks
Stability properties of MGU networks
Stability of MGU layers
Stability of MGU networks
Stability-promoted training of MGU networks
Traditional recurrent neural network training
Loss augmentation and stability-driven early stopping
Parameters warm-start
Projected gradient-based optimization method
...and 8 more sections

Key Result

Proposition 1

The set $\mathcal{H}_{\mathrm{inv}}^{(l)} \vcentcolon={} \left[-1,1\right]^{n_{h}^{(l)}}$ is a forward invariant compact set for the dynamics in eq:MGU_layer, meaning that, for any $l \in \mathcal{L}$, if $\boldsymbol{h}_{0}^{(l)} \in \mathcal{H}_{\mathrm{inv}}^{(l)}$, then $\boldsymbol{h}_{k}^{(l)}

Figures (6)

Figure 1: MGU $l$-th layer architecture at time $k$.
Figure 2: MGU network with $L$ layers at time $k$.
Figure 3: Comparison analysis between MGU and GRU networks for varying numbers of hidden units ($n^{(1)}_h{}$) on Dataset 1 (pH Reactor). (a) Distribution of the $\textrm{Fit}{}$ in \ref{['eq:fit']} over the different sequences that compose $\mathcal{D}_{\mathrm{val}}$, demonstrating comparable performance between MGU and GRU networks. (b) Parameters count, showing that MGU networks require approximately two-thirds of the GRU network parameters (see Remark \ref{['rem:mgu_smaller']}). (c) Distribution of the normalized inference times, illustrating the computational efficiency of the MGU network architecture. From this analysis, we select $n^{(1)}_h{} = 7$ for subsequent stability experiments on Dataset 1.
Figure 4: Comparison of stability-promoting methods over 30 training runs on Dataset 1 (pH Reactor), highlighting the performance-stability trade-off. (a) Distribution of the $\textrm{Fit}{}$ in \ref{['eq:fit']} on the test set $\mathcal{D}_{\mathrm{tst}}$. (b) Distribution of the $\delta$ISS in-range rate, showing the percentage of epochs satisfying the $\delta$ISS stability condition in \ref{['eq:MGU_layer_dISS_condition']} for MGU models and the corresponding condition from bonassi_stability_2021 for GRU models. Note the balance achieved by MGU$_{\text{WS}}$ and the low-accuracy of PGM and PGM+WS methodologies.
Figure 5: Time-domain output on $\mathcal{D}_{\mathrm{tst}}$ of Dataset 1 (pH Reactor) for the median-performing model (solid lines) from each configuration. The filled region indicates the Min-Max range across the 30 training runs. The ground truth (real system output) is shown as a dashed black line.
...and 1 more figures

Theorems & Definitions (13)

Definition 1: ISS jiang_input--state_2001terzi_learning_2021
Remark 1
Definition 2: $\delta$ISS bayerDiscretetimeIncrementalISS2013terzi_learning_2021
Remark 2
Proposition 1: Forward invariant set of the hidden state
Remark 3
Theorem 1: ISS of MGU layers
Theorem 2: $\delta$ISS of MGU layers
Proposition 2
Theorem 3: ISS of MGU networks
...and 3 more

Stability properties of Minimal Gated Unit neural networks

Abstract

Stability properties of Minimal Gated Unit neural networks

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)