Table of Contents
Fetching ...

High Resolution Seismic Waveform Generation using Denoising Diffusion

Kadek Hendrawan Palgunadi, Andreas Bergmeister, Andrea Bosisio, Laura Ermert, Maria Koroni, Nathanaël Perraudin, Simon Dirmeier, Men-Andrin Meier

TL;DR

HighFEM introduces a latent denoising diffusion model for conditional seismic waveform synthesis by mapping spectrograms through a variational autoencoder to a latent space and learning p_latent(z|c) with forward and backward stochastic differential equations. The approach enables realistic, high-frequency waveform generation (up to 50 Hz) conditioned on magnitude, distance, site velocity, depth, and station distribution, and demonstrates strong alignment with real data in time-domain envelopes, Fourier spectra, and scalar ground-motion statistics, often matching or exceeding traditional ground-motion models. The work provides extensive evaluation, including Fréchet-based spectral and embedding distances and an open-source library (tqdne) to train or deploy GWMs, enabling community benchmarking and regionalized hazard-scenario generation. Overall, HighFEM offers a scalable, waveform-centric alternative to conventional GMMs and physics-based simulations, with potential to enrich probabilistic seismic hazard assessment and nonlinear structural analyses through diverse, physically plausible waveform ensembles.

Abstract

Accurate prediction and synthesis of seismic waveforms are crucial for seismic-hazard assessment and earthquake-resistant infrastructure design. Existing prediction methods, such as ground-motion models and physics-based wave-field simulations, often fail to capture the full complexity of seismic wavefields, particularly at higher frequencies. This study introduces HighFEM, a novel, computationally efficient, and scalable (i.e., capable of generating many seismograms simultaneously) generative model for high-frequency seismic-waveform generation. Our approach leverages a spectrogram representation of the seismic-waveform data, which is reduced to a lower-dimensional manifold via an autoencoder. A state-of-the-art diffusion model is trained to generate this latent representation conditioned on key input parameters: earthquake magnitude, recording distance, site conditions, hypocenter depth, and azimuthal gap. The model generates waveforms with frequency content up to 50 Hz. Any scalar ground-motion statistic, such as peak ground-motion amplitudes and spectral accelerations, can be readily derived from the synthesized waveforms. We validate our model using commonly employed seismological metrics and performance metrics from image-generation studies. Our results demonstrate that the openly available model can generate realistic high-frequency seismic waveforms across a wide range of input parameters, even in data-sparse regions. For the scalar ground-motion statistics commonly used in seismic-hazard and earthquake-engineering studies, we show that our model accurately reproduces both the median trends of the real data and their variability. To evaluate and compare the growing number of these and similar Generative Waveform Models (GWMs), we argue that they should be openly available and included in community ground-motion-model evaluation efforts.

High Resolution Seismic Waveform Generation using Denoising Diffusion

TL;DR

HighFEM introduces a latent denoising diffusion model for conditional seismic waveform synthesis by mapping spectrograms through a variational autoencoder to a latent space and learning p_latent(z|c) with forward and backward stochastic differential equations. The approach enables realistic, high-frequency waveform generation (up to 50 Hz) conditioned on magnitude, distance, site velocity, depth, and station distribution, and demonstrates strong alignment with real data in time-domain envelopes, Fourier spectra, and scalar ground-motion statistics, often matching or exceeding traditional ground-motion models. The work provides extensive evaluation, including Fréchet-based spectral and embedding distances and an open-source library (tqdne) to train or deploy GWMs, enabling community benchmarking and regionalized hazard-scenario generation. Overall, HighFEM offers a scalable, waveform-centric alternative to conventional GMMs and physics-based simulations, with potential to enrich probabilistic seismic hazard assessment and nonlinear structural analyses through diverse, physically plausible waveform ensembles.

Abstract

Accurate prediction and synthesis of seismic waveforms are crucial for seismic-hazard assessment and earthquake-resistant infrastructure design. Existing prediction methods, such as ground-motion models and physics-based wave-field simulations, often fail to capture the full complexity of seismic wavefields, particularly at higher frequencies. This study introduces HighFEM, a novel, computationally efficient, and scalable (i.e., capable of generating many seismograms simultaneously) generative model for high-frequency seismic-waveform generation. Our approach leverages a spectrogram representation of the seismic-waveform data, which is reduced to a lower-dimensional manifold via an autoencoder. A state-of-the-art diffusion model is trained to generate this latent representation conditioned on key input parameters: earthquake magnitude, recording distance, site conditions, hypocenter depth, and azimuthal gap. The model generates waveforms with frequency content up to 50 Hz. Any scalar ground-motion statistic, such as peak ground-motion amplitudes and spectral accelerations, can be readily derived from the synthesized waveforms. We validate our model using commonly employed seismological metrics and performance metrics from image-generation studies. Our results demonstrate that the openly available model can generate realistic high-frequency seismic waveforms across a wide range of input parameters, even in data-sparse regions. For the scalar ground-motion statistics commonly used in seismic-hazard and earthquake-engineering studies, we show that our model accurately reproduces both the median trends of the real data and their variability. To evaluate and compare the growing number of these and similar Generative Waveform Models (GWMs), we argue that they should be openly available and included in community ground-motion-model evaluation efforts.

Paper Structure

This paper contains 38 sections, 16 equations, 70 figures, 3 tables.

Figures (70)

  • Figure 1: Real three-component acceleration seismograms (grey) and 5 randomly selected examples of GWM synthetics (alternating shades of red for improved visibility), for three sets of conditioning parameters: magnitude, hypocentral distance, $V_{S30}$, hypocenter depth, and azimuthal gap. Peak absolute amplitudes are given above each seismogram.
  • Figure 2: First order seismogram characteristics of real data (left) and GWM synthetics (right) for hypocentral distances of 50 - 70 km in 5 different magnitude bins. (a) and (b) Distribution of time-domain envelopes for radial component of the acceleration seismograms in terms of the mean (solid line) and the standard deviation (shaded areas). (c) and (d) Distribution of Fourier spectra log-amplitudes for radial component of the acceleration seismograms. The sample counts for each bin, in ascending order of magnitude, are $882, 398, 140, 48, 24$.
  • Figure 3: Model bias as a function of hypocentral distance for the generative waveform model (red), GMMs by boore2014nga (blue), and kanno2006new (yellow) for PGA (a, c, and e) and PGV (b, d, and f), with respect to real data. Colored lines represent the mean of the ratio in 50 distance bins of 3.67 km width. The bars represent the standard deviation in each bin.
  • Figure 4: Histogram of (a) PGA and (b) PGV residuals showing the spread of the real data (black), of the GWM synthetics (red), of the boore2003simulation GMM (blue), and of the kanno2006new GMM (yellow), with respect to the simple fitted ground motion models (equations \ref{['eq:GMPE_PGA']} and \ref{['eq:GMPE_PGV']}) on a $log_{10}$ scale. The box plot shows the median values, quantiles, and extreme values.
  • Figure 5: Shaking duration estimated using cumulative Arias Intensity (cAI). (a) cAI for a real example waveform (black line) and 100 GWM synthetics (red lines). Triangle-right and triangle-down symbols represent 5% and 95% of the maximum cAI for the real data (white) and GWM synthetics (red), respectively. (b) Shaking duration for real data (grey circles) and one GWM synthetic per real record (red triangles), with corresponding conditioning parameters. For each magnitude bin (every 0.08) from $M$ 4.0 - 6.0, grey dots and lines show the mean and standard deviation of the real data, while blue triangles and lines show the mean and standard deviation of the GWM synthetics.
  • ...and 65 more figures