Table of Contents
Fetching ...

Constructing balanced datasets for predicting failure modes in structural systems under seismic hazards

Jungho Kim, Taeyong Kim

TL;DR

This paper tackles the challenge of predicting structural failure modes under seismic hazards when training data are severely imbalanced, limiting ML performance for rare yet critical failures. It introduces a balanced-dataset framework built around three pillars: (1) selecting key GMFs that capture ground-motion variability, (2) an ISNS-inspired adaptive density-estimation workflow using GP surrogates and n-ball sampling to identify failure-domain densities, and (3) scaling-factor optimization to map GMF-space samples back to physically meaningful ground-motion time histories for nonlinear RHAs. The approach is demonstrated on a nine-story SAC steel building and a three-story MRF, showing that balanced training improves cross-dataset accuracy and minority-mode identification compared with imbalanced training, while preserving performance across both recorded and synthetic motions. The results underscore the framework’s potential to enhance seismic resilience analysis and enable more reliable data-driven failure-mode predictions, with future work including advanced DL architectures and integration with reliability-based design or resilience metrics.

Abstract

Accurate prediction of structural failure modes under seismic excitations is essential for seismic risk and resilience assessment. Traditional simulation-based approaches often result in imbalanced datasets dominated by non-failure or frequently observed failure scenarios, limiting the effectiveness in machine learning-based prediction. To address this challenge, this study proposes a framework for constructing balanced datasets that include distinct failure modes. The framework consists of three key steps. First, critical ground motion features (GMFs) are identified to effectively represent ground motion time histories. Second, an adaptive algorithm is employed to estimate the probability densities of various failure domains in the space of critical GMFs and structural parameters. Third, samples generated from these probability densities are transformed into ground motion time histories by using a scaling factor optimization process. A balanced dataset is constructed by performing nonlinear response history analyses on structural systems with parameters matching the generated samples, subjected to corresponding transformed ground motion time histories. Deep neural network models are trained on balanced and imbalanced datasets to highlight the importance of dataset balancing. To further evaluate the framework's applicability, numerical investigations are conducted using two different structural models subjected to recorded and synthetic ground motions. The results demonstrate the framework's robustness and effectiveness in addressing dataset imbalance and improving machine learning performance in seismic failure mode prediction.

Constructing balanced datasets for predicting failure modes in structural systems under seismic hazards

TL;DR

This paper tackles the challenge of predicting structural failure modes under seismic hazards when training data are severely imbalanced, limiting ML performance for rare yet critical failures. It introduces a balanced-dataset framework built around three pillars: (1) selecting key GMFs that capture ground-motion variability, (2) an ISNS-inspired adaptive density-estimation workflow using GP surrogates and n-ball sampling to identify failure-domain densities, and (3) scaling-factor optimization to map GMF-space samples back to physically meaningful ground-motion time histories for nonlinear RHAs. The approach is demonstrated on a nine-story SAC steel building and a three-story MRF, showing that balanced training improves cross-dataset accuracy and minority-mode identification compared with imbalanced training, while preserving performance across both recorded and synthetic motions. The results underscore the framework’s potential to enhance seismic resilience analysis and enable more reliable data-driven failure-mode predictions, with future work including advanced DL architectures and integration with reliability-based design or resilience metrics.

Abstract

Accurate prediction of structural failure modes under seismic excitations is essential for seismic risk and resilience assessment. Traditional simulation-based approaches often result in imbalanced datasets dominated by non-failure or frequently observed failure scenarios, limiting the effectiveness in machine learning-based prediction. To address this challenge, this study proposes a framework for constructing balanced datasets that include distinct failure modes. The framework consists of three key steps. First, critical ground motion features (GMFs) are identified to effectively represent ground motion time histories. Second, an adaptive algorithm is employed to estimate the probability densities of various failure domains in the space of critical GMFs and structural parameters. Third, samples generated from these probability densities are transformed into ground motion time histories by using a scaling factor optimization process. A balanced dataset is constructed by performing nonlinear response history analyses on structural systems with parameters matching the generated samples, subjected to corresponding transformed ground motion time histories. Deep neural network models are trained on balanced and imbalanced datasets to highlight the importance of dataset balancing. To further evaluate the framework's applicability, numerical investigations are conducted using two different structural models subjected to recorded and synthetic ground motions. The results demonstrate the framework's robustness and effectiveness in addressing dataset imbalance and improving machine learning performance in seismic failure mode prediction.

Paper Structure

This paper contains 18 sections, 10 equations, 16 figures, 14 tables.

Figures (16)

  • Figure 1: Architecture of the DNN model for near-real time failure mode prediction.
  • Figure 2: Proposed framework for constructing balanced datasets for failure mode prediction.
  • Figure 3: (a) Response spectra of selected ground motions with the target spectrum and (b) structural model. Figure (a) shows 100 response spectra with the median and 2.5%-97.5% quantiles of the target spectrum superimposed.
  • Figure 4: Histogram of failure modes obtained through MCS. This represents an imbalanced dataset dominated by safe scenario.
  • Figure 5: R-squared values (first row) and incremental improvements (second row) for different GMF combinations: (a) IDR$_1$, (b) IDR$_2$, and (c) IDR$_3$.
  • ...and 11 more figures