SA-EMO: Structure-Aligned Encoder Mixture of Operators for Generalizable Full-waveform Inversion
Wang Zhenyu, Li Peiyuan, Shi Yongxiang, Wu Ruoyu, Zhang Lei
TL;DR
This work tackles the challenges of full-waveform inversion by addressing the misalignment between seismic waveforms and velocity-velocity domains. It introduces SA-EMO, which combines a structure-aligned encoder that maps wavefields to a latent space $\mathbf{Z}$ with dimensions $C\times H\times W$ (e.g., $70\times70$), and a mixture-of-experts consisting of four neural operators (FNO, WNO, MNO, LNO) operating in that latent space. An adaptive routing mechanism, using type-weighted and group-weighted fusion with a strong–weak activation strategy, fuses operator outputs to achieve robust cross-type generalization, guided by a physics-aware loss combining spatial–structural and spectral-domain terms. Experiments on OpenFWI and Marmousi2 demonstrate that SA-EMO provides substantial improvements in MAE and boundary resolution, and ablations confirm the importance of the encoder, routing, and hybrid loss. The approach offers a scalable, interpretable pathway to physics-informed, generalizable seismic inversion with potential for rapid, high-resolution subsurface imaging. Key equations include $\mathbf{Z} = E_{\theta}(\mathbf{S})$, operator outputs $\hat{\mathbf{V}}_i = \mathcal{O}_i(\mathbf{Z})$, and fusion rules such as $\hat{\mathbf{V}} = (1+\lambda)\sum_{g\in\mathcal{S}} \bar{\beta}_g^{\text{strong}} \hat{\mathbf{V}}_g - \lambda\sum_{g\in\mathcal{W}} \bar{\beta}_g^{\text{weak}} \hat{\mathbf{V}}_g$.
Abstract
Full-waveform inversion (FWI) can produce high-resolution subsurface models, yet it remains inherently ill-posed, highly nonlinear, and computationally intensive. Although recent deep learning and numerical acceleration methods have improved speed and scalability, they often rely on single CNN architectures or single neural operators, which struggle to generalize in unknown or complex geological settings and are ineffective at distinguishing diverse geological types. To address these issues, we propose a Structure-Aligned Encoder-Mixture-of-Operators (SA-EMO) architecture for velocity-field inversion under unknown subsurface structures. First, a structure-aligned encoder maps high-dimensional seismic wavefields into a physically consistent latent space, thereby eliminating spatio-temporal mismatch between the waveform and velocity domains, recovering high-frequency components, and enhancing feature generalization. Then, an adaptive routing mechanism selects and fuses multiple neural-operator experts, including spectral, wavelet, multiscale, and local operators, to predict the velocity model. We systematically evaluate our approach on the OpenFWI benchmark and the Marmousi2 dataset. Results show that SA-EMO significantly outperforms traditional CNN or single-operator methods, achieving an average MAE reduction of approximately 58.443% and an improvement in boundary resolution of about 10.308%. Ablation studies further reveal that the structure-aligned encoder, the expert-fusion mechanism, and the routing module each contribute markedly to the performance gains. This work introduces a new paradigm for efficient, scalable, and physically interpretable full-waveform inversion.
