Exploration of Machine Learning Methods to Seismic Event Discrimination in the Pacific Northwest
Akash Kharita, Marine Denolle, Alexander R Hutko, J. Renate Hartog, Stephen D. Malone
TL;DR
This study tackles four-class seismic event discrimination in the Pacific Northwest by comparing classical feature-engineered ML and end-to-end DL approaches under a unified multi-class framework. Using ~200k three-component waveforms from >70k events, it benchmarks random forests on engineered feature sets and CNNs on time-series or spectrogram inputs, finding spectrogram-based CNNs to achieve top within-domain and out-of-domain accuracy (>92%). The lightweight QuakeXNet-2D, with ~70k parameters, delivers strong performance and real-time throughput (~9 s per day on commodity hardware) and, together with SeismicCNN-2D, demonstrates DL models’ superiority over CML in accuracy and efficiency. Generalization experiments reveal the need for diverse, augmented training data to handle out-of-domain surface events and near-field explosions, with Version 3 data improvements yielding the best overall robustness. The work provides deployable workflows via SeisBench and cloud pipelines, supporting real-time monitoring and transferable surface-event catalogs, and contributes a publicly available dataset and models for reproducibility and regional extension.
Abstract
Accurately separating tectonic, anthropogenic, and geomorphologic seismic sources is essential for Pacific Northwest (PNW) monitoring but remains difficult as networks densify and signals overlap. Prior work largely treats binary discrimination and seldom compares classic ML (feature-engineered) and deep learning (end-to-end) approaches under a common, multi-class setting with operational constraints. We evaluate methods and features for four-way source discrimination - earthquakes, explosions, surface events, and noise - and identify models that are both accurate and deployable. Using ~200k three-component waveforms from >70k events in an AI-curated PNW dataset, we test random-forest classifiers on TSFEL, physics-informed, and scattering features, and CNNs that ingest time series (1D) or spectrograms (2D); we benchmark on a balanced common test set, a 10k event network dataset, and out-of-domain data (global surface events; near-field blasts). CNNs taking spectrograms lead with accuracy performance over 92% for within-domain (as a short-and-fat CNN SeismicCNN 2D) and out-of-domain (as a long and skinny CNN QuakeXNet 2D), versus 89% for the best random forest; performance remains strong at low SNR and longer distances, and generalizes to independent network and global datasets. QuakeXNet-2D is lightweight (~70k parameters; ~1.2 MB), implemented into seisbench, scans a full day of 100 Hz, three-component data in ~9 s on commodity hardware, with released checkpoints. These results show spectrogram-based CNNs provide state-of-the-art accuracy, efficiency, and robustness for real-time PNW operations and transferable surface-event monitoring.
