Table of Contents
Fetching ...

Time-Warping Recurrent Neural Networks for Transfer Learning

Jonathon Hirschi

Abstract

Dynamical systems describe how a physical system evolves over time. Physical processes can evolve faster or slower in different environmental conditions. We use time-warping as rescaling the time in a model of a physical system. This thesis proposes a new method of transfer learning for Recurrent Neural Networks (RNNs) based on time-warping. We prove that for a class of linear, first-order differential equations known as time lag models, an LSTM can approximate these systems with any desired accuracy, and the model can be time-warped while maintaining the approximation accuracy. The Time-Warping method of transfer learning is then evaluated in an applied problem on predicting fuel moisture content (FMC), an important concept in wildfire modeling. An RNN with LSTM recurrent layers is pretrained on fuels with a characteristic time scale of 10 hours, where there are large quantities of data available for training. The RNN is then modified with transfer learning to generate predictions for fuels with characteristic time scales of 1 hour, 100 hours, and 1000 hours. The Time-Warping method is evaluated against several known methods of transfer learning. The Time-Warping method produces predictions with an accuracy level comparable to the established methods, despite modifying only a small fraction of the parameters that the other methods modify.

Time-Warping Recurrent Neural Networks for Transfer Learning

Abstract

Dynamical systems describe how a physical system evolves over time. Physical processes can evolve faster or slower in different environmental conditions. We use time-warping as rescaling the time in a model of a physical system. This thesis proposes a new method of transfer learning for Recurrent Neural Networks (RNNs) based on time-warping. We prove that for a class of linear, first-order differential equations known as time lag models, an LSTM can approximate these systems with any desired accuracy, and the model can be time-warped while maintaining the approximation accuracy. The Time-Warping method of transfer learning is then evaluated in an applied problem on predicting fuel moisture content (FMC), an important concept in wildfire modeling. An RNN with LSTM recurrent layers is pretrained on fuels with a characteristic time scale of 10 hours, where there are large quantities of data available for training. The RNN is then modified with transfer learning to generate predictions for fuels with characteristic time scales of 1 hour, 100 hours, and 1000 hours. The Time-Warping method is evaluated against several known methods of transfer learning. The Time-Warping method produces predictions with an accuracy level comparable to the established methods, despite modifying only a small fraction of the parameters that the other methods modify.

Paper Structure

This paper contains 59 sections, 6 theorems, 92 equations, 21 figures, 22 tables.

Key Result

Lemma 1

Let $(z_t)_{t\geq 0}$ be a sequence defined by the relationship in Equation eq:timelag with initial condition $|z_0|<\infty$, where $a\in (0,1)$, and $(X_t)_{t\geq 1}$ is bounded. Then $(z_t)_{t\geq 0}$ is bounded and has the closed-form $\blacktriangleleft$$\blacktriangleleft$

Figures (21)

  • Figure 1: Two complementary views of an RNN. (a) Left: a single RNN cell, following the formulation in Geron-2019-HOM. (b) Right: the RNN unrolled in time.
  • Figure 2: A single LSTM unit. This is an original graphic, developed following the formulation in Geron-2019-HOM. The computations are described in Equations \ref{['eq:lstm_eqs']}.
  • Figure 3: Deterministic trajectories from Newton's Law of Cooling for varying cooling constants. Varying values of $k$ change the characteristic time scale. The initial temperature and the ambient temperature of the environment are kept constant in each case.
  • Figure 4: RNN model architecture, pre-trained on FM10 from RAWS. An RNN with one recurrent layer with LSTM cells and three subsequent dense layers.
  • Figure 5: Observed vs Predicted FM10 for Zero-Shot Transfer RNN. The plotted RNN model is the set of weights that corresponds to the median RMSE on the test set out of 100 replications. Note the different x-axis ranges in the plots. (a) Top: All FM10 observations (n=1,232). The zero-shot RNN predictions substantially underestimate the FM10 for wet fuels. (b) Bottom: FM10 observations filtered to less than or equal to $30\%$ (n=1,134). The zero-shot RNN predictions are much more accurate, though the model is overestimating the FM10 for very dry fuels and underestimating the FM10 for wetter fuels.
  • ...and 16 more figures

Theorems & Definitions (12)

  • Lemma 1
  • proof
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • ...and 2 more