Table of Contents
Fetching ...

Exploration of Machine Learning Methods to Seismic Event Discrimination in the Pacific Northwest

Akash Kharita, Marine Denolle, Alexander R Hutko, J. Renate Hartog, Stephen D. Malone

TL;DR

This study tackles four-class seismic event discrimination in the Pacific Northwest by comparing classical feature-engineered ML and end-to-end DL approaches under a unified multi-class framework. Using ~200k three-component waveforms from >70k events, it benchmarks random forests on engineered feature sets and CNNs on time-series or spectrogram inputs, finding spectrogram-based CNNs to achieve top within-domain and out-of-domain accuracy (>92%). The lightweight QuakeXNet-2D, with ~70k parameters, delivers strong performance and real-time throughput (~9 s per day on commodity hardware) and, together with SeismicCNN-2D, demonstrates DL models’ superiority over CML in accuracy and efficiency. Generalization experiments reveal the need for diverse, augmented training data to handle out-of-domain surface events and near-field explosions, with Version 3 data improvements yielding the best overall robustness. The work provides deployable workflows via SeisBench and cloud pipelines, supporting real-time monitoring and transferable surface-event catalogs, and contributes a publicly available dataset and models for reproducibility and regional extension.

Abstract

Accurately separating tectonic, anthropogenic, and geomorphologic seismic sources is essential for Pacific Northwest (PNW) monitoring but remains difficult as networks densify and signals overlap. Prior work largely treats binary discrimination and seldom compares classic ML (feature-engineered) and deep learning (end-to-end) approaches under a common, multi-class setting with operational constraints. We evaluate methods and features for four-way source discrimination - earthquakes, explosions, surface events, and noise - and identify models that are both accurate and deployable. Using ~200k three-component waveforms from >70k events in an AI-curated PNW dataset, we test random-forest classifiers on TSFEL, physics-informed, and scattering features, and CNNs that ingest time series (1D) or spectrograms (2D); we benchmark on a balanced common test set, a 10k event network dataset, and out-of-domain data (global surface events; near-field blasts). CNNs taking spectrograms lead with accuracy performance over 92% for within-domain (as a short-and-fat CNN SeismicCNN 2D) and out-of-domain (as a long and skinny CNN QuakeXNet 2D), versus 89% for the best random forest; performance remains strong at low SNR and longer distances, and generalizes to independent network and global datasets. QuakeXNet-2D is lightweight (~70k parameters; ~1.2 MB), implemented into seisbench, scans a full day of 100 Hz, three-component data in ~9 s on commodity hardware, with released checkpoints. These results show spectrogram-based CNNs provide state-of-the-art accuracy, efficiency, and robustness for real-time PNW operations and transferable surface-event monitoring.

Exploration of Machine Learning Methods to Seismic Event Discrimination in the Pacific Northwest

TL;DR

This study tackles four-class seismic event discrimination in the Pacific Northwest by comparing classical feature-engineered ML and end-to-end DL approaches under a unified multi-class framework. Using ~200k three-component waveforms from >70k events, it benchmarks random forests on engineered feature sets and CNNs on time-series or spectrogram inputs, finding spectrogram-based CNNs to achieve top within-domain and out-of-domain accuracy (>92%). The lightweight QuakeXNet-2D, with ~70k parameters, delivers strong performance and real-time throughput (~9 s per day on commodity hardware) and, together with SeismicCNN-2D, demonstrates DL models’ superiority over CML in accuracy and efficiency. Generalization experiments reveal the need for diverse, augmented training data to handle out-of-domain surface events and near-field explosions, with Version 3 data improvements yielding the best overall robustness. The work provides deployable workflows via SeisBench and cloud pipelines, supporting real-time monitoring and transferable surface-event catalogs, and contributes a publicly available dataset and models for reproducibility and regional extension.

Abstract

Accurately separating tectonic, anthropogenic, and geomorphologic seismic sources is essential for Pacific Northwest (PNW) monitoring but remains difficult as networks densify and signals overlap. Prior work largely treats binary discrimination and seldom compares classic ML (feature-engineered) and deep learning (end-to-end) approaches under a common, multi-class setting with operational constraints. We evaluate methods and features for four-way source discrimination - earthquakes, explosions, surface events, and noise - and identify models that are both accurate and deployable. Using ~200k three-component waveforms from >70k events in an AI-curated PNW dataset, we test random-forest classifiers on TSFEL, physics-informed, and scattering features, and CNNs that ingest time series (1D) or spectrograms (2D); we benchmark on a balanced common test set, a 10k event network dataset, and out-of-domain data (global surface events; near-field blasts). CNNs taking spectrograms lead with accuracy performance over 92% for within-domain (as a short-and-fat CNN SeismicCNN 2D) and out-of-domain (as a long and skinny CNN QuakeXNet 2D), versus 89% for the best random forest; performance remains strong at low SNR and longer distances, and generalizes to independent network and global datasets. QuakeXNet-2D is lightweight (~70k parameters; ~1.2 MB), implemented into seisbench, scans a full day of 100 Hz, three-component data in ~9 s on commodity hardware, with released checkpoints. These results show spectrogram-based CNNs provide state-of-the-art accuracy, efficiency, and robustness for real-time PNW operations and transferable surface-event monitoring.

Paper Structure

This paper contains 45 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Map of seismic events in the curated catalog by ni2023curated. Earthquakes (blue circles) and explosions (purple) are located by the PNSN. Surface events are only marked at the seismic stations where they are recorded (green triangles).
  • Figure 2: Random examples of waveforms in each class.
  • Figure 3: Example of a processed single-component waveform, its corresponding spectrogram, and Fourier spectrum of (a) earthquake, (b) explosion, (c) noise, and (d) surface event.
  • Figure 4: Two main approaches to supervised machine learning in event discrimination: Classic Machine Learning (CML) requires feature engineering before classification. Deep Learning (DL) encompasses feature extraction and classification within a single optimization framework. Input data may be the raw time series, the Fourier Amplitude Spectrum, or the short-time Fourier amplitude spectrum (i.e., spectrograms). Features are either extracted and selected or transformed before CML classification, or they are solved in a single network using deep learning (DL). Finally, the four classes are predicted: Eq (earthquake), Exp (explosion), No (noise), and Su (surface events).
  • Figure 5: CML Model Performance. Precision (a), recall (b), and accuracy (c) for random forest models trained on different feature sets and waveform configurations.
  • ...and 5 more figures