Table of Contents
Fetching ...

Using the SEKF to Transfer NN Models of Dynamical Systems with Limited Data

Joshua E. Hammond, Tyler A. Soderstrom, Brian A. Korgel, Michael Baldea

TL;DR

Experimental validation across damped spring and continuous stirred-tank reactor systems demonstrates that small parameter perturbations to the initial model capture target system dynamics while requiring as little as 1% of original training data.

Abstract

Data-driven models of dynamical systems require extensive amounts of training data. For many practical applications, gathering sufficient data is not feasible due to cost or safety concerns. This work uses the Subset Extended Kalman Filter (SEKF) to adapt pre-trained neural network models to new, similar systems with limited data available. Experimental validation across damped spring and continuous stirred-tank reactor systems demonstrates that small parameter perturbations to the initial model capture target system dynamics while requiring as little as 1% of original training data. In addition, finetuning requires less computational cost and reduces generalization error.

Using the SEKF to Transfer NN Models of Dynamical Systems with Limited Data

TL;DR

Experimental validation across damped spring and continuous stirred-tank reactor systems demonstrates that small parameter perturbations to the initial model capture target system dynamics while requiring as little as 1% of original training data.

Abstract

Data-driven models of dynamical systems require extensive amounts of training data. For many practical applications, gathering sufficient data is not feasible due to cost or safety concerns. This work uses the Subset Extended Kalman Filter (SEKF) to adapt pre-trained neural network models to new, similar systems with limited data available. Experimental validation across damped spring and continuous stirred-tank reactor systems demonstrates that small parameter perturbations to the initial model capture target system dynamics while requiring as little as 1% of original training data. In addition, finetuning requires less computational cost and reduces generalization error.
Paper Structure (34 sections, 13 equations, 13 figures, 17 tables)

This paper contains 34 sections, 13 equations, 13 figures, 17 tables.

Figures (13)

  • Figure 1: Example trajectory of a damped spring-mass system (solid line) given initial position (blue triangle) and velocity (black arrow). A neural network trained on noisy measurements (blue $+$) makes predictions (orange $\times$) of future positions given the initial position and velocity. A target system with slightly different damping coefficient (dashed line) illustrates the transfer learning objective.
  • Figure 2: Normalized test loss versus target data size for finetuning and retraining initialization methods. (a) Damped spring system. (b) TCLab system. Error bars indicate standard error across replicates. The dashed line indicates source model performance. Finetuning achieves lower test loss, especially with limited data.
  • Figure 3: Train-test gap (test loss minus training loss) versus target data size, grouped by initialization method. Smaller gaps indicate better generalization. Finetuning exhibits consistently smaller train-test gaps than retraining across all data sizes in both systems.(a) Damped spring system. (b) TCLab system.
  • Figure 4: Distribution of cosine similarity between adapted model parameters and source model parameters. Dashed vertical lines indicate mean cosine similarity for each initialization method, with horizontal error bars showing $\pm 1$ standard deviation. (a) Damped spring system. (b) TCLab system. Finetuned models cluster tightly near 1.0, indicating that successful adaptation requires only small perturbations from the source model. Retrained models exhibit much wider spread and lower similarity, occupying different regions of parameter space.
  • Figure 5: Changes in model weights during transfer learning for the damped spring system when $c^\mathcal{T} = 0.9 c^\mathcal{S}$ and 1,000 target examples. Each subplot shows the distribution of weight changes in a specific layer of the neural network. Contrary to common heuristics in image classification, weight changes are distributed across all layers rather than concentrated in later layers. This suggests that transfer learning for dynamical systems requires holistic parameter adjustments.
  • ...and 8 more figures