Machine Learning for Predicting Chaotic Systems

Christof Schötz; Alistair White; Maximilian Gelbrecht; Niklas Boers

Machine Learning for Predicting Chaotic Systems

Christof Schötz, Alistair White, Maximilian Gelbrecht, Niklas Boers

TL;DR

The cumulative maximum error is introduced, a novel metric that combines desirable properties of traditional metrics and is tailored for chaotic systems and shows that well-tuned simple methods, as well as untuned baseline methods, often outperform state-of-the-art deep learning models.

Abstract

Predicting chaotic dynamical systems is critical in many scientific fields, such as weather forecasting, but challenging due to the characteristic sensitive dependence on initial conditions. Traditional modeling approaches require extensive domain knowledge, often leading to a shift towards data-driven methods using machine learning. However, existing research provides inconclusive results on which machine learning methods are best suited for predicting chaotic systems. In this paper, we compare different lightweight and heavyweight machine learning architectures using extensive existing benchmark databases, as well as a newly introduced database that allows for uncertainty quantification in the benchmark results. In addition to state-of-the-art methods from the literature, we also present new advantageous variants of established methods. Hyperparameter tuning is adjusted based on computational cost, with more tuning allocated to less costly methods. Furthermore, we introduce the cumulative maximum error, a novel metric that combines desirable properties of traditional metrics and is tailored for chaotic systems. Our results show that well-tuned simple methods, as well as untuned baseline methods, often outperform state-of-the-art deep learning models, but their performance can vary significantly with different experimental setups. These findings highlight the importance of aligning prediction methods with data characteristics and caution against the indiscriminate use of overly complex models.

Machine Learning for Predicting Chaotic Systems

TL;DR

Abstract

Paper Structure (50 sections, 10 equations, 13 figures, 32 tables)

This paper contains 50 sections, 10 equations, 13 figures, 32 tables.

Introduction
Methodology
Data
DeebLorenz
Dysts
Prediction Task and Evaluation
Estimation Methods
Cumulative Maximum Error -- CME
Hyperparameter Tuning
Results and 10 Key Insights
Lightweight Methods Outperform Heavyweight Methods
Polynomial Fits to Noisefree Data Are Accurate Emulators
Method Selection and Hyperparameter Optimization Are Essential
Noise and Timestep Design Influence Absolute and Relative Performance
Repeating Experiments is Crucial for a Robust Evaluation
...and 35 more sections

Figures (13)

Figure 1: Visual overview over the databases used in this study. We consider two different databases, Dysts and DeebLorenz. Each database consists of different (dynamical) systems (133 for Dysts, 3 for DeebLorenz). Systems may come with parameters, including the initial conditions. After random initialization of the parameters, ground truth data is generated by solving the differential equations numerically. From ground truth data, we generate synthetic 'observation' time series using different observations schemes (with or without noise; constant or random timestep $\Delta\!t$). In Dysts, noise is treated as system noise, i.e., it influences the evolution of the state of the system; in DeebLorenz, the noise is treated as measurement noise, i.e., the noisy data is observed but the system is evolved based on the ground truth. For each system and observation scheme, we create two different datasets: a validation dataset for hyperparameter tuning and a testing dataset for evaluation of performance. Each dataset consists of one (Dysts) or several (DeebLorenz) time series. Each time series is divided into two parts: the first part is designated as training data, while the second part serves different purposes depending on the dataset. For time series in the validation dataset, the second part is used as validation data, and for time series in the testing dataset, it is used as testing data. It is important to note that we use the terms 'testing' and 'validation' both to differentiate between datasets and to distinguish between data used for training and ground truth data to which the predictions are compared. In this setup, the training data from the validation dataset, the validation data from the validation dataset, the training data from the testing dataset, and the testing data from the testing dataset are all mutually exclusive.
Figure 2: The three systems of the database DeebLorenz. The plots show one example of a ground truth for each of the systems for an interval of 15 time units. The top row illustrates the attractor for each system in a 2D-projection of the state space. The bottom row shows individual time series plots for the three dimensions of the state space. All three systems are related to the Lorenz63 system Lorenz1963, which can be described with a sparse polynomial of degree two as vector field $f$. The vector field $f$ has three parameters, which are set to the default values found in the literature for Lorenz63std. In Lorenz63random, for each time series, we sample those three parameters from a probability distribution centered around the default values. In Lorenz63nonpar, the parameter values have a functional dependence on the state that, for each time series, is drawn randomly from a Gaussian process.
Figure 3: The plots show one example (out of 100) of train and test data from the testing dataset of DeebLorenz. Only the first state dimension (out of three) is shown. Observations are drawn in green, predictions of the best method in red, and the ground truth in black. As the noise is relatively small, a part of the observation time series is zoomed in to make it visible.
Figure 4: The Cumulative Maximum Error ($\mathsf{CME}$) for noiseless and noisy test data with constant timestep $\Delta\!t$ of system Lorenz63random from database DeebLorenz. The dots are colored according to the sum of average tune and test time of the respective method. Distance to the diagonal indicates sensitivity to noise. The plot shows clustering into a low error and a high error cluster, where all gradient descent based methods are in the high error group. The respective plots for the remaining system and timestep setting combinations are in the appendix, see \ref{['fig:plane:const:lorenzStd', 'fig:plane:const:lorenzNonparam', 'fig:plane:rand:lorenzStd', 'fig:plane:rand:lorenzRandom', 'fig:plane:rand:lorenzNonparam']}. Only some of them show a similarly clear clustering.
Figure 5: Comparison between polynomial fit and ODE solver.
...and 8 more figures

Theorems & Definitions (1)

proof

Machine Learning for Predicting Chaotic Systems

TL;DR

Abstract

Machine Learning for Predicting Chaotic Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (1)