On the choice of the non-trainable internal weights in random feature maps

Pinak Mandal; Georg A. Gottwald; Nicholas Cranch

On the choice of the non-trainable internal weights in random feature maps

Pinak Mandal, Georg A. Gottwald, Nicholas Cranch

TL;DR

The paper tackles how to select non-trainable internal weights in random feature maps to enhance forecasting of chaotic dynamical systems. By defining good internal parameters as those mapping training data into the nonlinear, non-saturated region of the tanh activation, the authors show that the number of good features $N_g$ acts as an effective feature dimension that governs forecasting skill. They introduce fast, data-informed hit-and-run sampling to uniformly draw from the good-parameter set, demonstrating superior forecasting performance and substantially lower training costs compared to gradient-descent-trained single-layer networks. The results also show that random feature maps with good internal weights better reproduce long-time statistical properties, suggesting practical advantages for reservoir-like forecasting with minimal optimization. Overall, the work provides a principled, optimization-free method to seed non-trainable parameters yielding strong predictive performance in dynamical systems models.

Abstract

The computationally cheap machine learning architecture of random feature maps can be viewed as a single-layer feedforward network in which the weights of the hidden layer are random but fixed and only the outer weights are learned via linear regression. The internal weights are typically chosen from a prescribed distribution. The choice of the internal weights significantly impacts the accuracy of random feature maps. We address here the task of how to best select the internal weights. In particular, we consider the forecasting problem whereby random feature maps are used to learn a one-step propagator map for a dynamical system. We provide a computationally cheap hit-and-run algorithm to select good internal weights which lead to good forecasting skill. We show that the number of good features is the main factor controlling the forecasting skill of random feature maps and acts as an effective feature dimension. Lastly, we compare random feature maps with single-layer feedforward neural networks in which the internal weights are now learned using gradient descent. We find that random feature maps have superior forecasting capabilities whilst having several orders of magnitude lower computational cost.

On the choice of the non-trainable internal weights in random feature maps

TL;DR

acts as an effective feature dimension that governs forecasting skill. They introduce fast, data-informed hit-and-run sampling to uniformly draw from the good-parameter set, demonstrating superior forecasting performance and substantially lower training costs compared to gradient-descent-trained single-layer networks. The results also show that random feature maps with good internal weights better reproduce long-time statistical properties, suggesting practical advantages for reservoir-like forecasting with minimal optimization. Overall, the work provides a principled, optimization-free method to seed non-trainable parameters yielding strong predictive performance in dynamical systems models.

Abstract

Paper Structure (16 sections, 28 equations, 23 figures, 3 algorithms)

This paper contains 16 sections, 28 equations, 23 figures, 3 algorithms.

Introduction
Dynamical setup
Random feature maps
The effect of the internal weights on the performance of random feature maps
How to sample good internal weights
Standard hit-and-run sampling of good internal parameters
One-shot hit-and-run sampling
Performance and comparison of the hit-and-run algorithms
Results for forecasting individual trajectories
Effect of the quality of internal weights on the forecast time Lg
Effect of the quality of internal weights on the outer weights W
Comparison with a single-layer feedforward network trained with gradient descent
Results for long-time statistical behaviour
Summary and future work
Appendix
...and 1 more sections

Figures (23)

Figure 1: Contour plots of the mean and standard deviation of the forecast time $\tau_f$ computed using $\mathbf{W}_{\rm in},\mathbf{b}_{\rm in}$ sampled uniformly from intervals of variable size $[-w, w]$ and $[-b, b]$ respectively. Samples were drawn for grid points $(w, b)$ on a $30\times30$ regular grid over the domain $(0,0.4)\times(0, 4.0)$. Averages are taken over $M=100$ realizations per grid-point $(w, b)$, for a feature dimension $D_r=300$, training data length $N=20,000$ and regularization parameter $\beta=4\times10^{-5}$, using the same training and validation data for ecah realization.
Figure 2: Domain and range of features produced by a $\tanh$-activation function with $L_0=0.4$ and $L_1=3.5$, leading to linear, saturated and good features.
Figure 3: Empirical histograms of average fractions of good, linear and saturated features, $p_g$, $p_l$ and $p_s$, respectively, for random feature maps corresponding to two groups: large forecast times with $\tau_f>8$ and low forecast times with $\tau_f<0.5$. The random feature maps are the same as those used in Figure \ref{['fig:heat-tau_f']}. Each group contains $500$ samples and the histograms depict the probability of having a certain value of the respective fractions in each group.
Figure 4: Mean forecast time $\mathbb{E}[\tau_f]$ as a function of the fraction $p$ of good, linear and saturated features, respectively. The random feature maps are the same as those used in Figure \ref{['fig:heat-tau_f']}. The mean forecast times are computed over bins $[p-\Delta p, p+\Delta p]$ with $\Delta p=0.001$. The shaded region delineates one standard deviation from the mean. We only report on bins which contain more than $100$ samples.
Figure 5: Schematic of the one-shot hit-and-run Algorithm \ref{['algo:hr-D']}. The weight point $\mathbf{0}$ is always an interior point of $\pi S_+^R$ and the cone $V(\mathbf{s})$ is a $D$-dimensional orthant. The set $\pi S_+^R$ is drawn as bounded here, but it may be unbounded depending on the training data $\mathbf{u}_n$.
...and 18 more figures

On the choice of the non-trainable internal weights in random feature maps

TL;DR

Abstract

On the choice of the non-trainable internal weights in random feature maps

Authors

TL;DR

Abstract

Table of Contents

Figures (23)