Table of Contents
Fetching ...

Information-theoretic analysis of temporal dependence in discrete stochastic processes: Application to precipitation predictability

Juan De Gregorio, David Sánchez, Raúl Toral

TL;DR

This work develops an information-theoretic framework to quantify temporal memory in discrete stochastic processes via the predictability gain $\mathcal{G}_u$, derived from block entropies $H_r$, and links it to the entropy rate $h$ through $G_T=H_1-h$. It introduces a bootstrap-based hypothesis-testing procedure and Fisher’s method to robustly estimate the memory order $\hat{m}^{\text{PG}}$ from finite data, outperforming AIC and BIC in simulations. Applied to daily precipitation records across the contiguous United States, the method reveals a dominance of low-order Markov memory ($m\in\{0,1\}$) with pronounced seasonal and regional variation (e.g., stronger West Coast winter correlations, stronger Southeast summer correlations). The resulting framework provides a transparent, data-driven approach for memory-aware stochastic modeling and real-time forecasting in spatially heterogeneous systems, with potential extension to other domains exhibiting short-term temporal dependencies.

Abstract

Understanding the temporal dependence of precipitation is key to improving weather predictability and developing efficient stochastic rainfall models. We introduce an information-theoretic approach to quantify memory effects in discrete stochastic processes and apply it to daily precipitation records across the contiguous United States. The method is based on the predictability gain, a quantity derived from block entropy that measures the additional information provided by higher-order temporal dependencies. This statistic, combined with a bootstrap-based hypothesis testing and Fisher's method, enables a robust memory estimator from finite data. Tests with generated sequences show that this estimator outperforms other model-selection criteria such as AIC and BIC. Applied to precipitation data, the analysis reveals that daily rainfall occurrence is well described by low-order Markov chains, exhibiting regional and seasonal variations, with stronger correlations in winter along the West Coast and in summer in the Southeast, consistent with known climatological patterns. Overall, our findings establish a framework for building parsimonious stochastic descriptions, useful when addressing spatial heterogeneity in the memory structure of precipitation dynamics, and support further advances in real-time, data-driven forecasting schemes.

Information-theoretic analysis of temporal dependence in discrete stochastic processes: Application to precipitation predictability

TL;DR

This work develops an information-theoretic framework to quantify temporal memory in discrete stochastic processes via the predictability gain , derived from block entropies , and links it to the entropy rate through . It introduces a bootstrap-based hypothesis-testing procedure and Fisher’s method to robustly estimate the memory order from finite data, outperforming AIC and BIC in simulations. Applied to daily precipitation records across the contiguous United States, the method reveals a dominance of low-order Markov memory () with pronounced seasonal and regional variation (e.g., stronger West Coast winter correlations, stronger Southeast summer correlations). The resulting framework provides a transparent, data-driven approach for memory-aware stochastic modeling and real-time forecasting in spatially heterogeneous systems, with potential extension to other domains exhibiting short-term temporal dependencies.

Abstract

Understanding the temporal dependence of precipitation is key to improving weather predictability and developing efficient stochastic rainfall models. We introduce an information-theoretic approach to quantify memory effects in discrete stochastic processes and apply it to daily precipitation records across the contiguous United States. The method is based on the predictability gain, a quantity derived from block entropy that measures the additional information provided by higher-order temporal dependencies. This statistic, combined with a bootstrap-based hypothesis testing and Fisher's method, enables a robust memory estimator from finite data. Tests with generated sequences show that this estimator outperforms other model-selection criteria such as AIC and BIC. Applied to precipitation data, the analysis reveals that daily rainfall occurrence is well described by low-order Markov chains, exhibiting regional and seasonal variations, with stronger correlations in winter along the West Coast and in summer in the Southeast, consistent with known climatological patterns. Overall, our findings establish a framework for building parsimonious stochastic descriptions, useful when addressing spatial heterogeneity in the memory structure of precipitation dynamics, and support further advances in real-time, data-driven forecasting schemes.

Paper Structure

This paper contains 21 sections, 6 theorems, 70 equations, 9 figures, 2 tables.

Key Result

Proposition 1

The predictability gain is additive: the amount of information gained when considering $k$th-order transition probabilities instead of $u$th-order transitions, for $k>u$, can be calculated as

Figures (9)

  • Figure 1: Block entropy (a) and predictability gain (b) for a binary system with possible states $(\beta_0=0,\beta_1=1)$ and memory $m=3$. The inset in panel (b) displays the first discrete derivative of the block entropy. In order to avoid any spurious bias, the set of $8$ transition probabilities $p(0|\beta_{i_1},\beta_{i_2},\beta_{i_3})$ for $i_1,i_2,i_3=0,1$ has been chosen randomly from a uniform distribution. The complementary probabilities are then set as $p(1|\beta_{i_1},\beta_{i_2},\beta_{i_3})=1-p(0|\beta_{i_1},\beta_{i_2},\beta_{i_3})$. Once these probabilities have been chosen, known analytical expressions have been used to compute the block entropies.
  • Figure 2: Example of a system with memory $m=1$. Black dots represent the values of $H_r$, and the black solid line shows the linear function $\mathcal{H}(r)$. The red vertical line at $r=0$ indicates the distance between these two curves, with its length corresponding to the value of $\mathcal{G}_0$.
  • Figure 3: Estimated predictability gain for a station located in Coos Bay, Oregon, for January (a) and August (b), shown in red. The mean and sample standard deviation, shown in black, are computed from $K=2000$ bootstrap samples generated numerically based on the estimated memory values: $\hat{m}^{\text{\tiny{PG}}}=1$ for (a) and $\hat{m}^{\text{\tiny{PG}}}=0$ for (b).
  • Figure 4: Estimated first-order transition probabilities for each station-month pair are shown as dots, with colors indicating the corresponding value of $\hat{\mathcal{G}}_0$. Dashed lines mark $\hat{p}(0|0)=0.5$, $\hat{p}(1|1)=0.5$, and the diagonal $\hat{p}(0|0)=1-\hat{p}(1|1)$ (iid case).
  • Figure 5: Seasonal averages of $\hat{\mathcal{G}}_0$ for each station. Winter (December–February) in panel (a); spring (March–May) in panel (b); summer (June–August) in panel (c); and autumn (September–November) in panel (d). The colorbar is saturated at the $99$th percentile ($\sim 0.11$) to enhance the visibility of lower values.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Proposition 6