Table of Contents
Fetching ...

Flow Matching for Atmospheric Retrieval of Exoplanets: Where Reliability meets Adaptive Noise Levels

Timothy D. Gebhard, Jonas Wildberger, Maximilian Dax, Annalena Kofler, Daniel Angerhausen, Sascha P. Quanz, Bernhard Schölkopf

TL;DR

This work advances exoplanet atmospheric retrieval by introducing flow matching posterior estimation (FMPE) with continuous normalizing flows, augmented by importance sampling (IS) for reliability and Bayesian evidence estimation. It also enables noise-level conditioning, allowing models to adapt to different noise models without retraining. Empirical results show FMPE-IS and NPE-IS achieve performance on par with nested sampling across noise levels, while FMPE trains ~3× faster and yields higher IS efficiency. The combination delivers a fast, amortized, and parallelizable framework with built-in verification and evidence-based model comparison, with significant implications for instrument design and robust atmospheric inference.

Abstract

Inferring atmospheric properties of exoplanets from observed spectra is key to understanding their formation, evolution, and habitability. Since traditional Bayesian approaches to atmospheric retrieval (e.g., nested sampling) are computationally expensive, a growing number of machine learning (ML) methods such as neural posterior estimation (NPE) have been proposed. We seek to make ML-based atmospheric retrieval (1) more reliable and accurate with verified results, and (2) more flexible with respect to the underlying neural networks and the choice of the assumed noise models. First, we adopt flow matching posterior estimation (FMPE) as a new ML approach to atmospheric retrieval. FMPE maintains many advantages of NPE, but provides greater architectural flexibility and scalability. Second, we use importance sampling (IS) to verify and correct ML results, and to compute an estimate of the Bayesian evidence. Third, we condition our ML models on the assumed noise level of a spectrum (i.e., error bars), thus making them adaptable to different noise models. Both our noise level-conditional FMPE and NPE models perform on par with nested sampling across a range of noise levels when tested on simulated data. FMPE trains about 3 times faster than NPE and yields higher IS efficiencies. IS successfully corrects inaccurate ML results, identifies model failures via low efficiencies, and provides accurate estimates of the Bayesian evidence. FMPE is a powerful alternative to NPE for fast, amortized, and parallelizable atmospheric retrieval. IS can verify results, thus helping to build confidence in ML-based approaches, while also facilitating model comparison via the evidence ratio. Noise level conditioning allows design studies for future instruments to be scaled up, for example, in terms of the range of signal-to-noise ratios.

Flow Matching for Atmospheric Retrieval of Exoplanets: Where Reliability meets Adaptive Noise Levels

TL;DR

This work advances exoplanet atmospheric retrieval by introducing flow matching posterior estimation (FMPE) with continuous normalizing flows, augmented by importance sampling (IS) for reliability and Bayesian evidence estimation. It also enables noise-level conditioning, allowing models to adapt to different noise models without retraining. Empirical results show FMPE-IS and NPE-IS achieve performance on par with nested sampling across noise levels, while FMPE trains ~3× faster and yields higher IS efficiency. The combination delivers a fast, amortized, and parallelizable framework with built-in verification and evidence-based model comparison, with significant implications for instrument design and robust atmospheric inference.

Abstract

Inferring atmospheric properties of exoplanets from observed spectra is key to understanding their formation, evolution, and habitability. Since traditional Bayesian approaches to atmospheric retrieval (e.g., nested sampling) are computationally expensive, a growing number of machine learning (ML) methods such as neural posterior estimation (NPE) have been proposed. We seek to make ML-based atmospheric retrieval (1) more reliable and accurate with verified results, and (2) more flexible with respect to the underlying neural networks and the choice of the assumed noise models. First, we adopt flow matching posterior estimation (FMPE) as a new ML approach to atmospheric retrieval. FMPE maintains many advantages of NPE, but provides greater architectural flexibility and scalability. Second, we use importance sampling (IS) to verify and correct ML results, and to compute an estimate of the Bayesian evidence. Third, we condition our ML models on the assumed noise level of a spectrum (i.e., error bars), thus making them adaptable to different noise models. Both our noise level-conditional FMPE and NPE models perform on par with nested sampling across a range of noise levels when tested on simulated data. FMPE trains about 3 times faster than NPE and yields higher IS efficiencies. IS successfully corrects inaccurate ML results, identifies model failures via low efficiencies, and provides accurate estimates of the Bayesian evidence. FMPE is a powerful alternative to NPE for fast, amortized, and parallelizable atmospheric retrieval. IS can verify results, thus helping to build confidence in ML-based approaches, while also facilitating model comparison via the evidence ratio. Noise level conditioning allows design studies for future instruments to be scaled up, for example, in terms of the range of signal-to-noise ratios.

Paper Structure

This paper contains 37 sections, 21 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Discrete Normalizing Flow (DNF)
  • Figure 2: Continuous Normalizing Flow (CNF)
  • Figure 4: Illustration of a time-dependent vector field (indicated by the gray arrows) continuously transforming samples from a standard 2D Gaussian at $t=0$ into samples from a more complex, star-shaped distribution at $t=1$. For simplicity, we show an unconditional example here; for an atmospheric retrieval, the vector field would not only depend on $t$ but also on the observed spectrum $x$ and the assumed noise level $\sigma$. The size of the arrows has been rescaled for visual purposes.
  • Figure 5: Comparison of the 1D and 2D marginal posteriors for the noise-free benchmark spectrum from nested sampling (as implemented by nautilus and MultiNest), FMPE, and NPE. For the latter two, we include the results with and without importance sampling. For visual purposes, we apply some light Gaussian smoothing to the histograms. Furthermore, we only show six selected parameters here; the full version featuring all 16 parameters is found in \ref{['fig:cornerplot-full']} in the appendix.
  • Figure 6: [Fe/H]
  • ...and 13 more figures