Table of Contents
Fetching ...

On the choice of the non-trainable internal weights in random feature maps

Pinak Mandal, Georg A. Gottwald, Nicholas Cranch

TL;DR

The paper tackles how to select non-trainable internal weights in random feature maps to enhance forecasting of chaotic dynamical systems. By defining good internal parameters as those mapping training data into the nonlinear, non-saturated region of the tanh activation, the authors show that the number of good features $N_g$ acts as an effective feature dimension that governs forecasting skill. They introduce fast, data-informed hit-and-run sampling to uniformly draw from the good-parameter set, demonstrating superior forecasting performance and substantially lower training costs compared to gradient-descent-trained single-layer networks. The results also show that random feature maps with good internal weights better reproduce long-time statistical properties, suggesting practical advantages for reservoir-like forecasting with minimal optimization. Overall, the work provides a principled, optimization-free method to seed non-trainable parameters yielding strong predictive performance in dynamical systems models.

Abstract

The computationally cheap machine learning architecture of random feature maps can be viewed as a single-layer feedforward network in which the weights of the hidden layer are random but fixed and only the outer weights are learned via linear regression. The internal weights are typically chosen from a prescribed distribution. The choice of the internal weights significantly impacts the accuracy of random feature maps. We address here the task of how to best select the internal weights. In particular, we consider the forecasting problem whereby random feature maps are used to learn a one-step propagator map for a dynamical system. We provide a computationally cheap hit-and-run algorithm to select good internal weights which lead to good forecasting skill. We show that the number of good features is the main factor controlling the forecasting skill of random feature maps and acts as an effective feature dimension. Lastly, we compare random feature maps with single-layer feedforward neural networks in which the internal weights are now learned using gradient descent. We find that random feature maps have superior forecasting capabilities whilst having several orders of magnitude lower computational cost.

On the choice of the non-trainable internal weights in random feature maps

TL;DR

The paper tackles how to select non-trainable internal weights in random feature maps to enhance forecasting of chaotic dynamical systems. By defining good internal parameters as those mapping training data into the nonlinear, non-saturated region of the tanh activation, the authors show that the number of good features acts as an effective feature dimension that governs forecasting skill. They introduce fast, data-informed hit-and-run sampling to uniformly draw from the good-parameter set, demonstrating superior forecasting performance and substantially lower training costs compared to gradient-descent-trained single-layer networks. The results also show that random feature maps with good internal weights better reproduce long-time statistical properties, suggesting practical advantages for reservoir-like forecasting with minimal optimization. Overall, the work provides a principled, optimization-free method to seed non-trainable parameters yielding strong predictive performance in dynamical systems models.

Abstract

The computationally cheap machine learning architecture of random feature maps can be viewed as a single-layer feedforward network in which the weights of the hidden layer are random but fixed and only the outer weights are learned via linear regression. The internal weights are typically chosen from a prescribed distribution. The choice of the internal weights significantly impacts the accuracy of random feature maps. We address here the task of how to best select the internal weights. In particular, we consider the forecasting problem whereby random feature maps are used to learn a one-step propagator map for a dynamical system. We provide a computationally cheap hit-and-run algorithm to select good internal weights which lead to good forecasting skill. We show that the number of good features is the main factor controlling the forecasting skill of random feature maps and acts as an effective feature dimension. Lastly, we compare random feature maps with single-layer feedforward neural networks in which the internal weights are now learned using gradient descent. We find that random feature maps have superior forecasting capabilities whilst having several orders of magnitude lower computational cost.
Paper Structure (16 sections, 28 equations, 23 figures, 3 algorithms)

This paper contains 16 sections, 28 equations, 23 figures, 3 algorithms.

Figures (23)

  • Figure 1: Contour plots of the mean and standard deviation of the forecast time $\tau_f$ computed using $\mathbf{W}_{\rm in},\mathbf{b}_{\rm in}$ sampled uniformly from intervals of variable size $[-w, w]$ and $[-b, b]$ respectively. Samples were drawn for grid points $(w, b)$ on a $30\times30$ regular grid over the domain $(0,0.4)\times(0, 4.0)$. Averages are taken over $M=100$ realizations per grid-point $(w, b)$, for a feature dimension $D_r=300$, training data length $N=20,000$ and regularization parameter $\beta=4\times10^{-5}$, using the same training and validation data for ecah realization.
  • Figure 2: Domain and range of features produced by a $\tanh$-activation function with $L_0=0.4$ and $L_1=3.5$, leading to linear, saturated and good features.
  • Figure 3: Empirical histograms of average fractions of good, linear and saturated features, $p_g$, $p_l$ and $p_s$, respectively, for random feature maps corresponding to two groups: large forecast times with $\tau_f>8$ and low forecast times with $\tau_f<0.5$. The random feature maps are the same as those used in Figure \ref{['fig:heat-tau_f']}. Each group contains $500$ samples and the histograms depict the probability of having a certain value of the respective fractions in each group.
  • Figure 4: Mean forecast time $\mathbb{E}[\tau_f]$ as a function of the fraction $p$ of good, linear and saturated features, respectively. The random feature maps are the same as those used in Figure \ref{['fig:heat-tau_f']}. The mean forecast times are computed over bins $[p-\Delta p, p+\Delta p]$ with $\Delta p=0.001$. The shaded region delineates one standard deviation from the mean. We only report on bins which contain more than $100$ samples.
  • Figure 5: Schematic of the one-shot hit-and-run Algorithm \ref{['algo:hr-D']}. The weight point $\mathbf{0}$ is always an interior point of $\pi S_+^R$ and the cone $V(\mathbf{s})$ is a $D$-dimensional orthant. The set $\pi S_+^R$ is drawn as bounded here, but it may be unbounded depending on the training data $\mathbf{u}_n$.
  • ...and 18 more figures