Data-Driven Prediction of Seismic Intensity Distributions Featuring Hybrid Classification-Regression Models

Koyu Mizutani; Haruki Mitarai; Kakeru Miyazaki; Soichiro Kumano; Toshihiko Yamasaki

Data-Driven Prediction of Seismic Intensity Distributions Featuring Hybrid Classification-Regression Models

Koyu Mizutani, Haruki Mitarai, Kakeru Miyazaki, Soichiro Kumano, Toshihiko Yamasaki

TL;DR

This work develops data-driven linear regression and hybrid classification-regression models to predict seismic intensity distributions without geographic inputs, trained on 1,857 Japan-near earthquakes (1997–2020). The approach uses a $64\times64$ grid representation of intensity, with depth and magnitude propagated over a $k\times k$ area, and compares classification, regression, and a hybrid fusion against conventional GMPEs. The hybrid model delivers the best performance across $r$, $F1$, and $MCC$ and can capture abnormal intensity patterns that GMPEs miss, demonstrating a meaningful advance for risk assessment and early warning. The dataset and code are openly published, enabling broader adoption and further research toward real-time predictions and subsurface characterization, potentially via NeRF-inspired density estimation.

Abstract

Earthquakes are among the most immediate and deadly natural disasters that humans face. Accurately forecasting the extent of earthquake damage and assessing potential risks can be instrumental in saving numerous lives. In this study, we developed linear regression models capable of predicting seismic intensity distributions based on earthquake parameters: location, depth, and magnitude. Because it is completely data-driven, it can predict intensity distributions without geographical information. The dataset comprises seismic intensity data from earthquakes that occurred in the vicinity of Japan between 1997 and 2020, specifically containing 1,857 instances of earthquakes with a magnitude of 5.0 or greater, sourced from the Japan Meteorological Agency. We trained both regression and classification models and combined them to take advantage of both to create a hybrid model. The proposed model outperformed commonly used Ground Motion Prediction Equations (GMPEs) in terms of the correlation coefficient, F1 score, and MCC. Furthermore, the proposed model can predict even abnormal seismic intensity distributions, a task at conventional GMPEs often struggle.

Data-Driven Prediction of Seismic Intensity Distributions Featuring Hybrid Classification-Regression Models

TL;DR

grid representation of intensity, with depth and magnitude propagated over a

area, and compares classification, regression, and a hybrid fusion against conventional GMPEs. The hybrid model delivers the best performance across

, and

and can capture abnormal intensity patterns that GMPEs miss, demonstrating a meaningful advance for risk assessment and early warning. The dataset and code are openly published, enabling broader adoption and further research toward real-time predictions and subsurface characterization, potentially via NeRF-inspired density estimation.

Abstract

Paper Structure (18 sections, 6 equations, 8 figures, 1 table)

This paper contains 18 sections, 6 equations, 8 figures, 1 table.

Introduction
Related Works
Preliminary
Methodology
Dataset
Inputs
Classification and Regression Model
Hybrid Model
Results
Evaluation Metrics
Qualitative Evaluation
Quantitative Evaluation
Distribution of Model Predictions
Predicting Abnormal Seismic Intensity Distributions
Discussions
...and 3 more sections

Figures (8)

Figure 1: Comparison of seismic intensity distributions for a JMA-magnitude-6.4 earthquake that occurred at approximately 00:03 am JST, April 15, 2016, showing both the ground truth and our prediction. The hypocenter of the event is located at latitude $32^\circ42.0'$N, longitude $130^\circ°46.6'$E, and a depth $7$ km. Regions near the epicenter are cropped for better visualization.
Figure 2: Seismic intensity distribution of the 2004 Chuetsu earthquake. Observed seismic intensities are represented in each grid cell. The epicenter is marked with a red circle.
Figure 3: Illustration of the input format. The depth and magnitude values are assigned across $k \times k$ cells, centering around the hypocenter's cell. All non-assigned cells retain 0.
Figure 4: Overview of the architecture used for both the classification and regression models. $64\times 64$ cells are flattened and input as a vector to the network. For the classification model, the network outputs the probabilities of seismic intensity classes at each cell. The regression model outputs continuous instrumental seismic intensity values.
Figure 5: Example comparison of seismic intensity distributions for a JMA-magnitude-6.4 earthquake occurred at approximately 00:03 JST, April 15, 2016: ground truth, GMPEs, regression, classification, and hybrid models. The hypocenter of the event is located at latitude $32^\circ42.0'$N, longitude $130^\circ°46.6'$E, and depth $7$ km. The regions near the epicenter are cropped for better visualization.
...and 3 more figures

Data-Driven Prediction of Seismic Intensity Distributions Featuring Hybrid Classification-Regression Models

TL;DR

Abstract

Data-Driven Prediction of Seismic Intensity Distributions Featuring Hybrid Classification-Regression Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)