Table of Contents
Fetching ...

Learning from the past: predicting critical transitions with machine learning trained on surrogates of historical data

Zhiqin Ma, Chunhua Zeng, Yi-Cheng Zhang, Thomas M. Bury

TL;DR

The paper addresses the prediction of critical transitions in complex systems where traditional early warning signals often fail in noisy or non-bifurcation contexts. It introduces surrogate data-based machine learning (SDML), which trains classifiers on surrogate trajectories derived from historical transitions to detect approaching thresholds. The authors show that SDML achieves higher sensitivity and specificity than variance and lag-1 autocorrelation across datasets from geology, climate, sociology, and cardiology, and demonstrate robustness to different surrogate-generation methods. The approach offers a system-specific, data-driven alternative that can complement existing EWS tools and improve preparedness for abrupt transitions.

Abstract

Complex systems can undergo critical transitions, where slowly changing environmental conditions trigger a sudden shift to a new, potentially catastrophic state. Early warning signals for these events are crucial for decision-making in fields such as ecology, biology and climate science. Generic early warning signals motivated by dynamical systems theory have had mixed success on real noisy data. More recent studies found that deep learning classifiers trained on synthetic data could improve performance. However, neither of these methods take advantage of historical, system-specific data. Here, we introduce an approach that trains machine learning classifiers directly on surrogate data of past transitions, namely surrogate data-based machine learning (SDML). The approach provides early warning signals in empirical and experimental data from geology, climatology, sociology, and cardiology with higher sensitivity and specificity than two widely used generic early warning signals -- variance and lag-1 autocorrelation. Since the approach is trained directly on surrogates of historical data, it is not bound by the restricting assumption of a local bifurcation like previous methods. This system-specific approach can contribute to improved early warning signals to help humans better prepare for or avoid undesirable critical transitions.

Learning from the past: predicting critical transitions with machine learning trained on surrogates of historical data

TL;DR

The paper addresses the prediction of critical transitions in complex systems where traditional early warning signals often fail in noisy or non-bifurcation contexts. It introduces surrogate data-based machine learning (SDML), which trains classifiers on surrogate trajectories derived from historical transitions to detect approaching thresholds. The authors show that SDML achieves higher sensitivity and specificity than variance and lag-1 autocorrelation across datasets from geology, climate, sociology, and cardiology, and demonstrate robustness to different surrogate-generation methods. The approach offers a system-specific, data-driven alternative that can complement existing EWS tools and improve preparedness for abrupt transitions.

Abstract

Complex systems can undergo critical transitions, where slowly changing environmental conditions trigger a sudden shift to a new, potentially catastrophic state. Early warning signals for these events are crucial for decision-making in fields such as ecology, biology and climate science. Generic early warning signals motivated by dynamical systems theory have had mixed success on real noisy data. More recent studies found that deep learning classifiers trained on synthetic data could improve performance. However, neither of these methods take advantage of historical, system-specific data. Here, we introduce an approach that trains machine learning classifiers directly on surrogate data of past transitions, namely surrogate data-based machine learning (SDML). The approach provides early warning signals in empirical and experimental data from geology, climatology, sociology, and cardiology with higher sensitivity and specificity than two widely used generic early warning signals -- variance and lag-1 autocorrelation. Since the approach is trained directly on surrogates of historical data, it is not bound by the restricting assumption of a local bifurcation like previous methods. This system-specific approach can contribute to improved early warning signals to help humans better prepare for or avoid undesirable critical transitions.

Paper Structure

This paper contains 9 sections, 4 figures.

Figures (4)

  • Figure 1: Illustration of the SDML prediction framework. (A) Two trajectories (blue) and smoothing (gray) of chick heart aggregates approaching a critical transition. The vertical dashed line marks the onset of the transition. A green background denotes sections of the time series taken as far from the transition ("Neutral") and red denotes sections close to the transition ("Pre-transition"). The trajectories are divided into training (left) and test (right) trajectories. (B) Thousands of surrogate time series are generated from the neutral (left) and pre-transition (right) training trajectories. (C) Machine learning classifiers used for the binary classification problem of distinguishing neutral from pre-transition time series. We use support vector machines (SVM), long short-term memory (LSTM) networks, convolutional neural networks (CNN), and Multi-Head CNN. (D) The changing trends of indicators prior to the transition in the test trajectory, including variance, lag-1 autocorrelation (AC), and probabilities assigned by the SDML classifier. The arrow illustrates the rolling window (50% of the time series) used for computing early warning signals.
  • Figure 2: Time series of rapid transition events for consecutive recordings from three different empirical systems. (A to C) Sedimentary archives from the Mediterranean Sea for core MS21 at depth 1,022 m (A), core MS66 at depth 1,630 m (B), and core 64PE406E1 at depth 1,760 m (C). (D) Paleoclimate transitions from Deuterium content expressed as $\delta$D (in $\text{\textperthousand}$ with respect to the standard mean ocean water). Time is given as years before present (BP). (E) Construction activity in the pre-Hispanic Pueblo societies, given as the number of trees felled per year. Vertical dashed lines indicate onsets of rapid transition events. Green lines are historical trajectories used to generate the surrogate data. Red lines are post-transition trajectories, not used. Blue lines are trajectories used for testing the early warning signals.
  • Figure 3: Trends in indicators prior to rapid transition events in empirical data using the amplitude adjusted Fourier transform (AAFT) surrogate method. Trajectories correspond to blue traces in Fig. 2. (A) Anoxic transition (Sapropel S1) obtained from data on the MS21 core. (B--C) Anoxic transitions (Sapropels S1 and S3) obtained from data on the MS66 core. (D--G) Anoxic transitions (Sapropels S3 to S6) obtained from data on the 64PE406E1 core. (H) End of glaciation I (i.e., the end of the last glaciation). (I) End of glaciation II. (J) Archaeological Period PII. (K) Archaeological Period Early PIII. (L) Archaeological Period Late PIII. (Top) Trajectory (blue) and Gaussian smoothing (grey); (Second down) Variance; (Third down) Lag-1 autocorrelation (AC); (Bottom) probability of an approaching critical transition assigned by the SDML classifier. The lines and shaded areas show the mean and 95% confidence interval, respectively. The arrows indicate the width of the rolling window used to compute early warning signals. The grey bands show transition phases.
  • Figure 4: ROC curves showing performance of indicators in the experimental and empirical data. The ROC curves show the SDML classifier (SDML, purple), variance (Var, orange), and lag-1 autocorrelation (AC, green) for the (A) chick heart aggregates going through a period-doubling bifurcation; (B) sediment data from the MS21 core, (C) MS66 core, and (D) 64PE406E1 core showing rapid transitions to an anoxic state in the Mediterranean Sea; (E) ice core records showing rapid paleoclimate transitions; and (F) transitions in construction activity in pre-Hispanic Pueblo societies. Predictions were obtained from 10 classifiers for each dataset, with 40 equally spaced predictions made between 60% and 100% (a, b, e, and f) or between 80% and 100% (c and d) of the way through the pre-transition data. The area under the curve (AUC), denoted by A, is a performance measure. The insets show the proportion of predictions made by the classifier for true pre-transition trajectories. "Pre-tran" means close to a critical transition, and "Neutral" means far from a critical transition.