Table of Contents
Fetching ...

SA-EMO: Structure-Aligned Encoder Mixture of Operators for Generalizable Full-waveform Inversion

Wang Zhenyu, Li Peiyuan, Shi Yongxiang, Wu Ruoyu, Zhang Lei

TL;DR

This work tackles the challenges of full-waveform inversion by addressing the misalignment between seismic waveforms and velocity-velocity domains. It introduces SA-EMO, which combines a structure-aligned encoder that maps wavefields to a latent space $\mathbf{Z}$ with dimensions $C\times H\times W$ (e.g., $70\times70$), and a mixture-of-experts consisting of four neural operators (FNO, WNO, MNO, LNO) operating in that latent space. An adaptive routing mechanism, using type-weighted and group-weighted fusion with a strong–weak activation strategy, fuses operator outputs to achieve robust cross-type generalization, guided by a physics-aware loss combining spatial–structural and spectral-domain terms. Experiments on OpenFWI and Marmousi2 demonstrate that SA-EMO provides substantial improvements in MAE and boundary resolution, and ablations confirm the importance of the encoder, routing, and hybrid loss. The approach offers a scalable, interpretable pathway to physics-informed, generalizable seismic inversion with potential for rapid, high-resolution subsurface imaging. Key equations include $\mathbf{Z} = E_{\theta}(\mathbf{S})$, operator outputs $\hat{\mathbf{V}}_i = \mathcal{O}_i(\mathbf{Z})$, and fusion rules such as $\hat{\mathbf{V}} = (1+\lambda)\sum_{g\in\mathcal{S}} \bar{\beta}_g^{\text{strong}} \hat{\mathbf{V}}_g - \lambda\sum_{g\in\mathcal{W}} \bar{\beta}_g^{\text{weak}} \hat{\mathbf{V}}_g$.

Abstract

Full-waveform inversion (FWI) can produce high-resolution subsurface models, yet it remains inherently ill-posed, highly nonlinear, and computationally intensive. Although recent deep learning and numerical acceleration methods have improved speed and scalability, they often rely on single CNN architectures or single neural operators, which struggle to generalize in unknown or complex geological settings and are ineffective at distinguishing diverse geological types. To address these issues, we propose a Structure-Aligned Encoder-Mixture-of-Operators (SA-EMO) architecture for velocity-field inversion under unknown subsurface structures. First, a structure-aligned encoder maps high-dimensional seismic wavefields into a physically consistent latent space, thereby eliminating spatio-temporal mismatch between the waveform and velocity domains, recovering high-frequency components, and enhancing feature generalization. Then, an adaptive routing mechanism selects and fuses multiple neural-operator experts, including spectral, wavelet, multiscale, and local operators, to predict the velocity model. We systematically evaluate our approach on the OpenFWI benchmark and the Marmousi2 dataset. Results show that SA-EMO significantly outperforms traditional CNN or single-operator methods, achieving an average MAE reduction of approximately 58.443% and an improvement in boundary resolution of about 10.308%. Ablation studies further reveal that the structure-aligned encoder, the expert-fusion mechanism, and the routing module each contribute markedly to the performance gains. This work introduces a new paradigm for efficient, scalable, and physically interpretable full-waveform inversion.

SA-EMO: Structure-Aligned Encoder Mixture of Operators for Generalizable Full-waveform Inversion

TL;DR

This work tackles the challenges of full-waveform inversion by addressing the misalignment between seismic waveforms and velocity-velocity domains. It introduces SA-EMO, which combines a structure-aligned encoder that maps wavefields to a latent space with dimensions (e.g., ), and a mixture-of-experts consisting of four neural operators (FNO, WNO, MNO, LNO) operating in that latent space. An adaptive routing mechanism, using type-weighted and group-weighted fusion with a strong–weak activation strategy, fuses operator outputs to achieve robust cross-type generalization, guided by a physics-aware loss combining spatial–structural and spectral-domain terms. Experiments on OpenFWI and Marmousi2 demonstrate that SA-EMO provides substantial improvements in MAE and boundary resolution, and ablations confirm the importance of the encoder, routing, and hybrid loss. The approach offers a scalable, interpretable pathway to physics-informed, generalizable seismic inversion with potential for rapid, high-resolution subsurface imaging. Key equations include , operator outputs , and fusion rules such as .

Abstract

Full-waveform inversion (FWI) can produce high-resolution subsurface models, yet it remains inherently ill-posed, highly nonlinear, and computationally intensive. Although recent deep learning and numerical acceleration methods have improved speed and scalability, they often rely on single CNN architectures or single neural operators, which struggle to generalize in unknown or complex geological settings and are ineffective at distinguishing diverse geological types. To address these issues, we propose a Structure-Aligned Encoder-Mixture-of-Operators (SA-EMO) architecture for velocity-field inversion under unknown subsurface structures. First, a structure-aligned encoder maps high-dimensional seismic wavefields into a physically consistent latent space, thereby eliminating spatio-temporal mismatch between the waveform and velocity domains, recovering high-frequency components, and enhancing feature generalization. Then, an adaptive routing mechanism selects and fuses multiple neural-operator experts, including spectral, wavelet, multiscale, and local operators, to predict the velocity model. We systematically evaluate our approach on the OpenFWI benchmark and the Marmousi2 dataset. Results show that SA-EMO significantly outperforms traditional CNN or single-operator methods, achieving an average MAE reduction of approximately 58.443% and an improvement in boundary resolution of about 10.308%. Ablation studies further reveal that the structure-aligned encoder, the expert-fusion mechanism, and the routing module each contribute markedly to the performance gains. This work introduces a new paradigm for efficient, scalable, and physically interpretable full-waveform inversion.

Paper Structure

This paper contains 45 sections, 16 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Cross-domain SSIM comparison of data-driven inversion frameworks.
  • Figure 2: Overview of the proposed Encoder–Mixture-of-Operators (SA-EMO) framework. SA-EMO integrates four synergistic components: (1) a structure-aligned encoder that maps the seismic wavefield to a physically consistent latent domain; (2) a set of complementary neural operator experts capturing global-to-local geological features; (3) an adaptive routing mechanism for structure-aware expert fusion; and (4) a hybrid spatial–spectral–type-aware loss for unified supervision.
  • Figure 3: Visualization of structure-aligned encoder and Fourier spectra on CurveVel-B. Each row corresponds to one test sample. The first four columns show the encoder feature maps (channels 40, 39, 114, and 62) exhibiting layer-consistent activations; the next columns show their power spectra (log$_{10}$ scale), dominated by low frequencies ($r^{\ast}\!\approx\!0.02$) with HF/LF$<0.1$. The rightmost panels compare input, target, and predicted spectra, demonstrating perfect frequency alignment ($f_\text{pred}=f_\text{gt}=485.6$).