Table of Contents
Fetching ...

Learnable & Interpretable Model Combination in Dynamical Systems Modeling

Tobias Thummerer, Lars Mikelsons

TL;DR

This work addresses how to learn to combine heterogeneous dynamical-system representations, including ML components and physical simulators, while avoiding algebraic loops and discontinuities, for example in dynamics described by $\dot{\bm{x}}(t) = \bm{f}(\bm{x}(t),\bm{u}(t),\bm{p},t)$. It introduces the HUDA-ODE class within the $UODE$ family to capture mixed algebraic, discrete, and differential equations, with event conditions $\mathbf{c}$ and updates $\mathbf{a}$ and outputs $\mathbf{g}$. A generic topology for composing two algebraic subsystems uses a linear connection matrix $\mathbf{W}$ and augmented constraints $\mathbf{c}_a$, $\mathbf{c}_b$, $\mathbf{c}_z$, yielding a solvable, gradient-friendly formulation compatible with FMI. The approach is demonstrated by learning and interpreting several two-subsystem combinations and is supported by a software implementation.

Abstract

During modeling of dynamical systems, often two or more model architectures are combined to obtain a more powerful or efficient model regarding a specific application area. This covers the combination of multiple machine learning architectures, as well as hybrid models, i.e., the combination of physical simulation models and machine learning. In this work, we briefly discuss which types of model are usually combined in dynamical systems modeling and propose a class of models that is capable of expressing mixed algebraic, discrete, and differential equation-based models. Further, we examine different established, as well as new ways of combining these models from the point of view of system theory and highlight two challenges - algebraic loops and local event functions in discontinuous models - that require a special approach. Finally, we propose a new wildcard architecture that is capable of describing arbitrary combinations of models in an easy-to-interpret fashion that can be learned as part of a gradient-based optimization procedure. In a final experiment, different combination architectures between two models are learned, interpreted, and compared using the methodology and software implementation provided.

Learnable & Interpretable Model Combination in Dynamical Systems Modeling

TL;DR

This work addresses how to learn to combine heterogeneous dynamical-system representations, including ML components and physical simulators, while avoiding algebraic loops and discontinuities, for example in dynamics described by . It introduces the HUDA-ODE class within the family to capture mixed algebraic, discrete, and differential equations, with event conditions and updates and outputs . A generic topology for composing two algebraic subsystems uses a linear connection matrix and augmented constraints , , , yielding a solvable, gradient-friendly formulation compatible with FMI. The approach is demonstrated by learning and interpreting several two-subsystem combinations and is supported by a software implementation.

Abstract

During modeling of dynamical systems, often two or more model architectures are combined to obtain a more powerful or efficient model regarding a specific application area. This covers the combination of multiple machine learning architectures, as well as hybrid models, i.e., the combination of physical simulation models and machine learning. In this work, we briefly discuss which types of model are usually combined in dynamical systems modeling and propose a class of models that is capable of expressing mixed algebraic, discrete, and differential equation-based models. Further, we examine different established, as well as new ways of combining these models from the point of view of system theory and highlight two challenges - algebraic loops and local event functions in discontinuous models - that require a special approach. Finally, we propose a new wildcard architecture that is capable of describing arbitrary combinations of models in an easy-to-interpret fashion that can be learned as part of a gradient-based optimization procedure. In a final experiment, different combination architectures between two models are learned, interpreted, and compared using the methodology and software implementation provided.
Paper Structure (6 sections, 8 equations, 5 figures)

This paper contains 6 sections, 8 equations, 5 figures.

Figures (5)

  • Figure 1: The serial topology, as proposed in Thompson:1994 or further discussed, e.g. in Rudolph:2024. The submodel $\bm{s}_a$ processes values, before passing them further to subsystem $\bm{s}_b$, that computes the actual results.
  • Figure 2: The parallel topology, as proposed in Thompson:1994 or further discussed, e.g. in Rudolph:2024. The submodel $\bm{s}_b$ learns for the residuals of subsystem $\bm{s}_a$ and adds them to the common result.
  • Figure 3: For submodels $\bm{s}_a$ and $\bm{s}_b$ with an algebraic relation between inputs and outputs (for example, if both are FFNN), algebraic loops can easily be constructed by connecting both models cyclically. Without handling this algebraic loop, the resulting CM can't be evaluated (inference and backpropagation) correctly.
  • Figure 4: During regular simulation (without events, solid arrows), submodel $\bm{s}_a$ maps the state $\bm{x}(t^-)$ to $\hat{\bm{x}}(t^-)$, submodel $\bm{s}_b$ computes further quantities (for example the state derivative) based on this. In case of an event in the submodel $\bm{s}_b$ (dashed arrows), the local state within $\bm{s}_b$$\hat{\bm{x}}(t^-)$ is updated to $\hat{\bm{x}}(t^+)$. This new local state must be propagated backwards through submodel $\bm{s}_a$ to obtain a new global state $\bm{x}(t^+)$ to proceed with the numerical integration.
  • Figure 5: Two algebraic subsystems $\bm{s}_a$ and $\bm{s}_b$ can be combined to a single system $\bm{s}_z$. Because new unknowns $\bm{\upsilon}_a$, $\bm{\upsilon}_b$ and $\bm{\gamma}_z$ are introduced, the system is not well-posed. This can be seen in the missing connections between the input of the combined model ($\bm{\upsilon}_z$) and its output ($\bm{\gamma}_z$) and the inputs of the submodels ($\bm{\upsilon}_a$ and $\bm{\upsilon}_b$) and the outputs ($\bm{\gamma}_a$ and $\bm{\gamma}_b$).