Table of Contents
Fetching ...

Data-driven modeling of time-domain induced polarization

Charles L. Bérubé, Pierre Bérubé

TL;DR

This work introduces IP-VAE, a data-driven, unsupervised variational autoencoder for time-domain induced polarization (IP) data learned from a massive, multi-region compilation. The model provides four practical applications: generating representative synthetic IP decays, Bayesian denoising with uncertainty quantification, survey quality assessment via reconstruction-based S/N metrics, and automatic outlier detection. A key finding is that a single latent scalar—strongly correlated with the average chargeability $\\bar{m}$—suffices to capture the meaningful structure of the IP decays, challenging multi-parameter IP parametrizations. The authors release an open-source, pre-trained IP-VAE that generalizes to new IP data and outline future directions toward full-waveform IP and spectral IP data, offering a data-driven path to standardized IP processing and interpretation.

Abstract

We present a novel approach for data-driven modeling of the time-domain induced polarization (IP) phenomenon using variational autoencoders (VAE). VAEs are Bayesian neural networks that aim to learn a latent statistical distribution to encode extensive data sets as lower dimension representations. We collected 1 600 319 IP decay curves in various regions of Canada, the United States and Kazakhstan, and compiled them to train a deep VAE. The proposed deep learning approach is strictly unsupervised and data-driven: it does not require manual processing or ground truth labeling of IP data. Moreover, our VAE approach avoids the pitfalls of IP parametrization with the empirical Cole-Cole and Debye decomposition models, simple power-law models, or other sophisticated mechanistic models. We demonstrate four applications of VAEs to model and process IP data: (1) representative synthetic data generation, (2) unsupervised Bayesian denoising and data uncertainty estimation, (3) quantitative evaluation of the signal-to-noise ratio, and (4) automated outlier detection. We also interpret the IP compilation's latent representation and reveal a strong correlation between its first dimension and the average chargeability of IP decays. Finally, we experiment with varying VAE latent space dimensions and demonstrate that a single real-valued scalar parameter contains sufficient information to encode our extensive IP data compilation. This new finding suggests that modeling time-domain IP data using mathematical models governed by more than one free parameter is ambiguous, whereas modeling only the average chargeability is justified. A pre-trained implementation of our model -- readily applicable to new IP data from any geolocation -- is available as open-source Python code for the applied geophysics community.

Data-driven modeling of time-domain induced polarization

TL;DR

This work introduces IP-VAE, a data-driven, unsupervised variational autoencoder for time-domain induced polarization (IP) data learned from a massive, multi-region compilation. The model provides four practical applications: generating representative synthetic IP decays, Bayesian denoising with uncertainty quantification, survey quality assessment via reconstruction-based S/N metrics, and automatic outlier detection. A key finding is that a single latent scalar—strongly correlated with the average chargeability —suffices to capture the meaningful structure of the IP decays, challenging multi-parameter IP parametrizations. The authors release an open-source, pre-trained IP-VAE that generalizes to new IP data and outline future directions toward full-waveform IP and spectral IP data, offering a data-driven path to standardized IP processing and interpretation.

Abstract

We present a novel approach for data-driven modeling of the time-domain induced polarization (IP) phenomenon using variational autoencoders (VAE). VAEs are Bayesian neural networks that aim to learn a latent statistical distribution to encode extensive data sets as lower dimension representations. We collected 1 600 319 IP decay curves in various regions of Canada, the United States and Kazakhstan, and compiled them to train a deep VAE. The proposed deep learning approach is strictly unsupervised and data-driven: it does not require manual processing or ground truth labeling of IP data. Moreover, our VAE approach avoids the pitfalls of IP parametrization with the empirical Cole-Cole and Debye decomposition models, simple power-law models, or other sophisticated mechanistic models. We demonstrate four applications of VAEs to model and process IP data: (1) representative synthetic data generation, (2) unsupervised Bayesian denoising and data uncertainty estimation, (3) quantitative evaluation of the signal-to-noise ratio, and (4) automated outlier detection. We also interpret the IP compilation's latent representation and reveal a strong correlation between its first dimension and the average chargeability of IP decays. Finally, we experiment with varying VAE latent space dimensions and demonstrate that a single real-valued scalar parameter contains sufficient information to encode our extensive IP data compilation. This new finding suggests that modeling time-domain IP data using mathematical models governed by more than one free parameter is ambiguous, whereas modeling only the average chargeability is justified. A pre-trained implementation of our model -- readily applicable to new IP data from any geolocation -- is available as open-source Python code for the applied geophysics community.

Paper Structure

This paper contains 40 sections, 13 equations.