SeisT: A foundational deep learning model for earthquake monitoring tasks

Sen Li; Xu Yang; Anye Cao; Changbin Wang; Yaoqi Liu; Yapeng Liu; Qiang Niu

SeisT: A foundational deep learning model for earthquake monitoring tasks

Sen Li, Xu Yang, Anye Cao, Changbin Wang, Yaoqi Liu, Yapeng Liu, Qiang Niu

TL;DR

A foundational deep-learning model, the seismogram transformer (SeisT), designed for a variety of earthquake-monitoring tasks, which suggests that SeisT has the potential to contribute to the advancement of seismic signal processing and earthquake research.

Abstract

Seismograms, the fundamental seismic records, have revolutionized earthquake research and monitoring. Recent advancements in deep learning have further enhanced seismic signal processing, leading to even more precise and effective earthquake monitoring capabilities. This paper introduces a foundational deep learning model, the Seismogram Transformer (SeisT), designed for a variety of earthquake monitoring tasks. SeisT combines multiple modules tailored to different tasks and exhibits impressive out-of-distribution generalization performance, outperforming or matching state-of-the-art models in tasks like earthquake detection, seismic phase picking, first-motion polarity classification, magnitude estimation, back-azimuth estimation, and epicentral distance estimation. The performance scores on the tasks are 0.96, 0.96, 0.68, 0.95, 0.86, 0.55, and 0.81, respectively. The most significant improvements, in comparison to existing models, are observed in phase-P picking, phase-S picking, and magnitude estimation, with gains of 1.7%, 9.5%, and 8.0%, respectively. Our study, through rigorous experiments and evaluations, suggests that SeisT has the potential to contribute to the advancement of seismic signal processing and earthquake research.

SeisT: A foundational deep learning model for earthquake monitoring tasks

TL;DR

Abstract

Paper Structure (18 sections, 25 equations, 10 figures, 1 table)

This paper contains 18 sections, 25 equations, 10 figures, 1 table.

Introduction
Methods
Network architecture
Network design
Multi-scaled mixed convolution
Multi-path Transformer
Local aggregation
Data and Results
Data and labeling
Evaluation
Training details
Results
Comparison with other methods
Ablation study
Discussion
...and 3 more sections

Figures (10)

Figure 1: SeisT model architecture. The proposed model takes the normalized 3-component seismic waveform as input, preprocesses it through a stem module containing $S$ stem layers, and then maps it to a high-dimensional feature space through $L$ body modules. The $i$-th body module consists of a local aggregation module, $C_i$ MSMC modules, and $T_i$ MPT modules. It incorporates several fundamental modules, such as Group Convolution Block (GCB), Local Aggregation Multi-Head Attention (LAMHA), Depthwise Separable Convolution (DSConv), and Multi-Layer Perceptron (MLP). The model outputs vector representations specific to the given task through the head module.
Figure 2: Multi-path Transformer architecture. First, the input tensor is projected into two distinct lower-dimensional subspaces. Subsequently, it is separately fed into the Transformer with local aggregation and the grouped convolution block parallel architecture. Finally, feature fusion is achieved through concatenation with a multi-layer perceptron.
Figure 3: Local aggregation module. This module reduces the size in the time dimension, fuses local features through two transformation functions, increases channel depth through a linear projection layer, and uses BatchNorm for normalization.
Figure 4: Geographical distribution of DiTing and PNW datasets. (a) Comparison of seismic event distribution in both datasets. (b) Distribution of seismic events in the DiTing dataset (R1). (c) Visualization of seismic event distribution (R2), magnitude, and source depth in the PNW dataset.
Figure 5: Phase picking and earthquake detection examples. (a-d) are event examples from the test dataset, encompassing various SNRs, epicentral distances, and magnitudes. Specifically, (a) and (b) exhibit relatively high SNR values of 29 dB and 43 dB, respectively. The corresponding event of (a) has a magnitude of 1.7 and an epicentral distance of 37.5 km, and the corresponding event of (b) has a magnitude of 2.1 and an epicentral distance of 98 km. (c) and (d) exhibit lower SNR values of 4 dB and 1 dB, respectively. The corresponding event of (c) has a magnitude of 1.5 and an epicentral distance of 16 km, and the corresponding event of (d) has a magnitude of 5.0 with an epicentral distance of 216 km. Within the context of these examples, Z, N, and E represent the seismogram components. $P$ and $S$ represent the manual picking labels of P-wave and S-wave arrivals, respectively. $\hat{P}$ and $\hat{S}$ represent the predicted probabilities for P-wave and S-wave arrivals, respectively. $\hat{D}$ represents the probability of predicting the time from the P-wave arrival to the end of the S-wave coda.
...and 5 more figures

SeisT: A foundational deep learning model for earthquake monitoring tasks

TL;DR

Abstract

SeisT: A foundational deep learning model for earthquake monitoring tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (10)