State Derivative Normalization for Continuous-Time Deep Neural Networks

Jonas Weigand; Gerben I. Beintema; Jonas Ulmen; Daniel Görges; Roland Tóth; Maarten Schoukens; Martin Ruskowski

State Derivative Normalization for Continuous-Time Deep Neural Networks

Jonas Weigand, Gerben I. Beintema, Jonas Ulmen, Daniel Görges, Roland Tóth, Maarten Schoukens, Martin Ruskowski

TL;DR

This work addresses numerical and optimization challenges in continuous-time neural state-space models caused by improper normalization of states, derivatives, or sampling intervals. It introduces state-derivative normalization (SDN) via a positive factor $\tau$ that scales the state-derivative, with interpretations in the state, derivative, and time domains, revealing a coupling among these quantities. Three estimation methods are proposed for $\tau$: a trainable parameter, cross-validation, and a BLA-based heuristic that leverages a linear surrogate to set a principled normalization. Experiments on the cascaded-tanks CTS benchmark show that appropriate $\tau$ selection significantly improves RMSE, with the proposed methods achieving state-of-the-art performance among black-box approaches. The results establish SDN as a practical, robust approach for continuous-time neural ODE-like models in system identification.

Abstract

The importance of proper data normalization for deep neural networks is well known. However, in continuous-time state-space model estimation, it has been observed that improper normalization of either the hidden state or hidden state derivative of the model estimate, or even of the time interval can lead to numerical and optimization challenges with deep learning based methods. This results in a reduced model quality. In this contribution, we show that these three normalization tasks are inherently coupled. Due to the existence of this coupling, we propose a solution to all three normalization challenges by introducing a normalization constant at the state derivative level. We show that the appropriate choice of the normalization constant is related to the dynamics of the to-be-identified system and we derive multiple methods of obtaining an effective normalization constant. We compare and discuss all the normalization strategies on a benchmark problem based on experimental data from a cascaded tanks system and compare our results with other methods of the identification literature.

State Derivative Normalization for Continuous-Time Deep Neural Networks

TL;DR

that scales the state-derivative, with interpretations in the state, derivative, and time domains, revealing a coupling among these quantities. Three estimation methods are proposed for

: a trainable parameter, cross-validation, and a BLA-based heuristic that leverages a linear surrogate to set a principled normalization. Experiments on the cascaded-tanks CTS benchmark show that appropriate

selection significantly improves RMSE, with the proposed methods achieving state-of-the-art performance among black-box approaches. The results establish SDN as a practical, robust approach for continuous-time neural ODE-like models in system identification.

Abstract

Paper Structure (4 sections, 1 theorem, 16 equations, 4 figures, 1 table)

This paper contains 4 sections, 1 theorem, 16 equations, 4 figures, 1 table.

Interpretation of the Normalization
Estimation of the Normalization Factor
Experimental Results
Conclusion

Key Result

Theorem 1

Given a input trajectory $u(t)$, a non-constant state trajectory $x(t)$ which satisfies $\dot x (t) = f(x(t),u(t))$ as in eq:system-equations for all $t \in [0,L]$ than there exists a $\tau \in \mathbb{R}^+$ and a scalar state transformation $\gamma \hat{x}(t) = x(t)$ such that both the model state

Figures (4)

Figure 1: Time scaling of the output measurement of the CTS benchmark Schoukens.2016b. Black: Original measurement (unit seconds). Red: Time-domain scaled data (unitless).
Figure 2: Picture of the CTS Schoukens.2016b.
Figure 3: $200$ independent experiments, each with the same configuration except for the random weight initialization and a fixed scalar normalization factor. We tested $10$ different normalization factors repeated $20$ times each. The box plot displays the median, lower quartile, upper quartile, minimum and maximum values. Note that experiments with a normalization $T_\mathrm{s} / \tau = 40$ sometimes lead to unstable results, with RMSE $> 10^9$.
Figure 4: Simulation results on the CTS benchmark Schoukens.2016b. Results of the best models obtained with the trained normalization factor, the cross-validation method, and the BLA are displayed.

Theorems & Definitions (1)

Theorem 1

State Derivative Normalization for Continuous-Time Deep Neural Networks

TL;DR

Abstract

State Derivative Normalization for Continuous-Time Deep Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (1)