Table of Contents
Fetching ...

$\texttt{Exoformer}$: Accelerating Bayesian atmospheric retrievals with transformer neural networks

L. Pagliaro, T. Zingales, G. Piotto, I. Giovannini, G. Mantovan

Abstract

Computationally expensive and time-consuming Bayesian atmospheric retrievals pose a significant bottleneck for the rapid analysis of high-quality exoplanetary spectra from present and next generation space telescopes, such as JWST and Ariel. As these missions demand more complex atmospheric models to fully characterize the spectral features they uncover, they will benefit from data-driven analysis techniques such as machine and deep learning. We introduce and detail a novel approach that uses a transformer-based neural network ($\texttt{Exoformer}$) to rapidly generate informative prior distributions for atmospheric transmission spectra of hot Jupiters. We demonstrate the effectiveness of $\texttt{Exoformer}$ using both simulated observations and real JWST data of WASP-39b and WASP-17b within the TauREx retrieval framework, leveraging the nested sampling algorithm. By replacing standard uniform priors with $\texttt{Exoformer}$-derived informative priors, our method accelerates nested-sampling retrievals by factor of 3-8 in the tested cases, while preserving the retrieved parameters and best-fit spectra. Crucially, we ensure that the retrieved parameters and the best-fit models remain consistent with results from classical methods. Furthermore, we confirm the statistical consistency of the two retrieval approaches by comparing their log-Bayesian evidence, obtaining absolute values of each Bayes factor $|Δ\log{Z}|<5$, i.e., with no strong preference following common scales for either model. This hybrid approach significantly enhances the efficiency of atmospheric retrieval tools without compromising their accuracy, paving the way for more rapid analysis of complex exoplanetary spectra and enabling the integration of more realistic atmospheric models.

$\texttt{Exoformer}$: Accelerating Bayesian atmospheric retrievals with transformer neural networks

Abstract

Computationally expensive and time-consuming Bayesian atmospheric retrievals pose a significant bottleneck for the rapid analysis of high-quality exoplanetary spectra from present and next generation space telescopes, such as JWST and Ariel. As these missions demand more complex atmospheric models to fully characterize the spectral features they uncover, they will benefit from data-driven analysis techniques such as machine and deep learning. We introduce and detail a novel approach that uses a transformer-based neural network () to rapidly generate informative prior distributions for atmospheric transmission spectra of hot Jupiters. We demonstrate the effectiveness of using both simulated observations and real JWST data of WASP-39b and WASP-17b within the TauREx retrieval framework, leveraging the nested sampling algorithm. By replacing standard uniform priors with -derived informative priors, our method accelerates nested-sampling retrievals by factor of 3-8 in the tested cases, while preserving the retrieved parameters and best-fit spectra. Crucially, we ensure that the retrieved parameters and the best-fit models remain consistent with results from classical methods. Furthermore, we confirm the statistical consistency of the two retrieval approaches by comparing their log-Bayesian evidence, obtaining absolute values of each Bayes factor , i.e., with no strong preference following common scales for either model. This hybrid approach significantly enhances the efficiency of atmospheric retrieval tools without compromising their accuracy, paving the way for more rapid analysis of complex exoplanetary spectra and enabling the integration of more realistic atmospheric models.

Paper Structure

This paper contains 23 sections, 17 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Schematic of the Exoformer architecture. Each box represents a layer described in Section \ref{['section:exoformer']}. Inside the dashed box, the layers forming a single encoder block are indicated. Multiple encoder blocks repeated sequentially form the transformer encoder.
  • Figure 2: Left plot: Training and validation losses as a function of the training step. Right plot: Learning rate trend as determined by the learning rate schedule applied during training.
  • Figure 3: Preprocessing phases on the test planet spectrum in Table \ref{['table:test-case-planet']}. Upper plot: Analytical spectrum computed using the TauREx forward model. Middle plot: Analytical spectrum binned to the custom grid and normalization bands. Bottom plot: Interpolated spectrum after normalization.
  • Figure 4: Simulated NIRSpec PRISM observation of the transmission spectrum in Fig. \ref{['fig:preprocessing']}. The observational data points (black dots) are binned to the native resolution of NIRSpec PRISM ($R=100$) and superimposed on the original (red line) TauREx analytical spectrum.
  • Figure 5: Posterior distributions and ground truth values (red lines) of the seven parameters. The retrieval was performed using Exoformer on the NIRSpec PRISM simulation. The dashed lines indicate the median of the distribution, while the dashed-dotted lines indicate the $1\sigma$ intervals.
  • ...and 6 more figures