Probabilistic and Alarm-Based Evaluation of a b-Value-Driven Deep Learning Earthquake Forecast

Jonas Köhler; Wei Li; Johannes Faber; Georg Rümpker; Nishtha Srivastava

Probabilistic and Alarm-Based Evaluation of a b-Value-Driven Deep Learning Earthquake Forecast

Jonas Köhler, Wei Li, Johannes Faber, Georg Rümpker, Nishtha Srivastava

TL;DR

It is indicated that spatiotemporal variations in b-values contain a persistent, though limited, signal relevant to probabilistic earthquake forecasting, yielding marginal but consistent improvements over baseline models across complementary evaluation frameworks.

Abstract

We evaluate the forecasting performance of a deep learning model, originally introduced as a pattern-extraction framework, that operates on the spatiotemporal evolution of seismic b-values in a short-term forecasting context. Model output is rescaled to account for training on balanced datasets and evaluated relative to a spatial base-rate model using the Brier Skill Score (BSS). Absolute skill values are small, but mean BSS values are consistently positive, including at locations where Mw geq 5 earthquakes occurred during the test period, indicating information content beyond historical seismicity alone. Alarm-based evaluation using Molchan diagrams shows elevated event capture rates at low alarm fractions (5.88 percent of events captured at 1 percent area under alarm), indicating discrimination exceeding random and purely spatial reference models under constrained alarm conditions. Comparison with ETAS-derived triggered probabilities further reveals a weak positive correlation, suggesting partial sensitivity of the model output to seismic regimes characterized by enhanced clustering and recent activity, while remaining distinct from classical aftershock-based descriptions. Together, these results indicate that spatiotemporal variations in b-values contain a persistent, though limited, signal relevant to probabilistic earthquake forecasting, yielding marginal but consistent improvements over baseline models across complementary evaluation frameworks.

Probabilistic and Alarm-Based Evaluation of a b-Value-Driven Deep Learning Earthquake Forecast

TL;DR

Abstract

Paper Structure (24 sections, 9 equations, 6 figures, 1 table)

This paper contains 24 sections, 9 equations, 6 figures, 1 table.

Introduction
Data and Base Model Summary
Earthquake Catalog and Preprocessing
Construction of $b$-Value Fields
Base Model Architecture
Progressive Training Scheme
Base Model Output
Theoretical Background
Brier Score
Brier Skill Score
Molchan Diagrams
Forecast Construction via Rescaling
Brier-score–based rate rescaling
Logit-based rescaling
Molchan style Analysis
...and 9 more sections

Figures (6)

Figure 1: Evaluation of model output rescaling using the Brier Skill Score (BSS) relative to the historical spatial bin-event rate. Each panel shows the BSS as a function of a multiplicative scaling factor and the mean number of earthquakes $\overline{n_{\mathrm{eq}}}$ used for local $b$-value estimation. The left panel corresponds to the training (calibration) epochs, and the right panel to the independent test epochs; BSS values are clipped at $-1$ for visualization. Black circles indicate the empirical BSS-based rescaling obtained by fitting along the ridge of positive skill. Green circles show the logit-based prior correction with the global event rate fixed from the training set. Orange circles denote a logit-based rescaling with an additional offset calibrated to optimize BSS. All rescaling functions are fitted on the training period and shown on the test period for diagnostic comparison only. Violin plots illustrate the distribution of $\overline{n_{\mathrm{eq}}}$ for all evaluated space-time samples (left) and for samples culminating in an earthquake with $\mathrm{M_{W}}$$\geq 5$ (right) in each panel. The red horizontal lines to the left are the result of less $\mathrm{M_{W}}$$\geq 5$ events happening in this bin in the test epochs.
Figure 2: Molchan Diagram for Model4.9 showing the alarm rates for $1\%$ and $5\%$ alarm rates and the AUC with their respective $95\%$ confidence intervals.
Figure 3: Completeness map including the ETAS region outline and the $\mathrm{M_{W}}$$\geq 5$ events. Completeness was determined using a $1^\circ$ radius around the center of each cell to collect the earthquakes, and then use the method in Godano2023 to determine $M_c$. As we do not visually inspect the curve for each cell, we determine $M_c$ as the smallest $m_{th}$ with $m_a(m_{th}) > \min(m_a) +0.3$. The red marks correspond to the events eligible to the model by being within $1.6^\circ$ degrees from the edge, a limit shown by the dotted gray line. The green marks correspond to $\mathrm{M_{W}}$$\geq 5$ events not included in the test data (no event from this region is the target in either training or testing).
Figure 4: This figure shows the time-magnitude plot for the test set used in this work. The events are colored using the logits transformation of the ETAS derived probability that the earthquake is triggered. We show the earthquakes below the ETAS inversion threshold in 3.5 in black with low opacity.
Figure 5: Scatter plot of model output against ETAS background probability. The scatter shows a small negative correlation between background probability and model output. The fit was created using a linear regression. Confidence intervals ($95\%$) on the noted correlations and the fit were calculated using 10000 bootstrapping resamples.
...and 1 more figures

Probabilistic and Alarm-Based Evaluation of a b-Value-Driven Deep Learning Earthquake Forecast

TL;DR

Abstract

Probabilistic and Alarm-Based Evaluation of a b-Value-Driven Deep Learning Earthquake Forecast

Authors

TL;DR

Abstract

Table of Contents

Figures (6)