Table of Contents
Fetching ...

Predicting unobserved climate time series data at distant areas via spatial correlation using reservoir computing

Shihori Koyama, Daisuke Inoue, Hiroaki Yoshida, Kazuyuki Aihara, Gouhei Tanaka

TL;DR

This paper tackles predicting unobserved climate time series at distant locations by exploiting spatial correlations, comparing reservoir computing (RC) via an Echo State Network against vector autoregression (VAR). Using JRA-55 reanalysis data for near-surface temperature and pressure, it shows prediction accuracy declines with increasing observation–target distance, and identifies a RC predictable range where RC outperforms VAR for highly correlated data. A key finding is that input–output time-series correlation strongly governs predictive performance, enabling a simple criterion to assess when RC is likely to be effective. The work demonstrates low-cost, fast, data-driven interpolation potential for climate monitoring and offers guidance for selecting observation points and extending RC-based methods to other variables and timescales.

Abstract

Collecting time series data spatially distributed in many locations is often important for analyzing climate change and its impacts on ecosystems. However, comprehensive spatial data collection is not always feasible, requiring us to predict climate variables at some locations. This study focuses on a prediction of climatic elements, specifically near-surface temperature and pressure, at a target location apart from a data observation point. Our approach uses two prediction methods: reservoir computing (RC), known as a machine learning framework with low computational requirements, and vector autoregression models (VAR), recognized as a statistical method for analyzing time series data. Our results show that the accuracy of the predictions degrades with the distance between the observation and target locations. We quantitatively estimate the distance in which effective predictions are possible. We also find that in the context of climate data, a geographical distance is associated with data correlation, and a strong data correlation significantly improves the prediction accuracy with RC. In particular, RC outperforms VAR in predicting highly correlated data within the predictive range. These findings suggest that machine learning-based methods can be used more effectively to predict climatic elements in remote locations by assessing the distance to them from the data observation point in advance. Our study on low-cost and accurate prediction of climate variables has significant value for climate change strategies.

Predicting unobserved climate time series data at distant areas via spatial correlation using reservoir computing

TL;DR

This paper tackles predicting unobserved climate time series at distant locations by exploiting spatial correlations, comparing reservoir computing (RC) via an Echo State Network against vector autoregression (VAR). Using JRA-55 reanalysis data for near-surface temperature and pressure, it shows prediction accuracy declines with increasing observation–target distance, and identifies a RC predictable range where RC outperforms VAR for highly correlated data. A key finding is that input–output time-series correlation strongly governs predictive performance, enabling a simple criterion to assess when RC is likely to be effective. The work demonstrates low-cost, fast, data-driven interpolation potential for climate monitoring and offers guidance for selecting observation points and extending RC-based methods to other variables and timescales.

Abstract

Collecting time series data spatially distributed in many locations is often important for analyzing climate change and its impacts on ecosystems. However, comprehensive spatial data collection is not always feasible, requiring us to predict climate variables at some locations. This study focuses on a prediction of climatic elements, specifically near-surface temperature and pressure, at a target location apart from a data observation point. Our approach uses two prediction methods: reservoir computing (RC), known as a machine learning framework with low computational requirements, and vector autoregression models (VAR), recognized as a statistical method for analyzing time series data. Our results show that the accuracy of the predictions degrades with the distance between the observation and target locations. We quantitatively estimate the distance in which effective predictions are possible. We also find that in the context of climate data, a geographical distance is associated with data correlation, and a strong data correlation significantly improves the prediction accuracy with RC. In particular, RC outperforms VAR in predicting highly correlated data within the predictive range. These findings suggest that machine learning-based methods can be used more effectively to predict climatic elements in remote locations by assessing the distance to them from the data observation point in advance. Our study on low-cost and accurate prediction of climate variables has significant value for climate change strategies.
Paper Structure (14 sections, 6 equations, 19 figures, 3 tables)

This paper contains 14 sections, 6 equations, 19 figures, 3 tables.

Figures (19)

  • Figure 1: Conceptual diagram of time series prediction in a geographically distant point. The aim is to predict "unobserved" climate time series data at a target location from the "observable" time series data. See Sec. \ref{['sec:methods']} for details of prediction methods.
  • Figure 2: (a) An RC model called Echo State Network Jaeger2001. (b) Location of the observation point and the target point. The observation point is one point on a yellow band, and the target point is the cross point of two yellow bands, indicated as Tokyo in this figure. (c) Overview of the Year $N$ dataset. The transient data with length $T_\mathrm{trans}$ is the last 300 steps of Year $(N-2)$ data, the training data with length $T_\mathrm{train}$ is the whole of Year $(N-1)$ data, and the test data with length $T_\mathrm{test}$ is the whole of Year $N$ data.
  • Figure 3: Results of RC-based prediction showing Year 2021 data. (a),(b) temperature, (c),(d) pressure. Predicted time course (red solid line) is superimposed on the test data. The time series data after removing the seasonal trend are used for training. The variables $y_1(n)$ and $y_2(n)$ denote the actual input and output data, respectively, while $y_1'(n)$ and $y_2'(n)$ denote input and output data restored to original scale, respectively.
  • Figure 4: Performance comparison of prediction methods in NRMSE. The prediction methods include RC-based prediction (red filled circles), VAR-based prediction (blue downward triangles), and historical average based prefiction (green solid line). (a),(b) temperature, (c),(d) pressure prediction. The yellow dashed lines are the scaled correlation coefficients between true time series data at the observation and target points. Vertical dashed lines indicate the location of the target point (Tokyo). The downward arrows show the thresholds where the NRMSE reaches a plateau.
  • Figure 5: The relation between NRMSE and input-output correlation. The NRMSE values are obtained in temperature prediction for the target points at Tokyo (RC: red circles, VAR: blue downward triangles). The regression lines are drawn for the RC model (red solid line) and VAR model (blue dashed line). The fitting parameters of the lines are determined for the RC and VAR results assuming that the slope parameter is independent and the intercept parameter is common.
  • ...and 14 more figures