On the choice of the non-trainable internal weights in random feature maps
Pinak Mandal, Georg A. Gottwald, Nicholas Cranch
TL;DR
The paper tackles how to select non-trainable internal weights in random feature maps to enhance forecasting of chaotic dynamical systems. By defining good internal parameters as those mapping training data into the nonlinear, non-saturated region of the tanh activation, the authors show that the number of good features $N_g$ acts as an effective feature dimension that governs forecasting skill. They introduce fast, data-informed hit-and-run sampling to uniformly draw from the good-parameter set, demonstrating superior forecasting performance and substantially lower training costs compared to gradient-descent-trained single-layer networks. The results also show that random feature maps with good internal weights better reproduce long-time statistical properties, suggesting practical advantages for reservoir-like forecasting with minimal optimization. Overall, the work provides a principled, optimization-free method to seed non-trainable parameters yielding strong predictive performance in dynamical systems models.
Abstract
The computationally cheap machine learning architecture of random feature maps can be viewed as a single-layer feedforward network in which the weights of the hidden layer are random but fixed and only the outer weights are learned via linear regression. The internal weights are typically chosen from a prescribed distribution. The choice of the internal weights significantly impacts the accuracy of random feature maps. We address here the task of how to best select the internal weights. In particular, we consider the forecasting problem whereby random feature maps are used to learn a one-step propagator map for a dynamical system. We provide a computationally cheap hit-and-run algorithm to select good internal weights which lead to good forecasting skill. We show that the number of good features is the main factor controlling the forecasting skill of random feature maps and acts as an effective feature dimension. Lastly, we compare random feature maps with single-layer feedforward neural networks in which the internal weights are now learned using gradient descent. We find that random feature maps have superior forecasting capabilities whilst having several orders of magnitude lower computational cost.
