Table of Contents
Fetching ...

Understanding Exoplanet Habitability: A Bayesian ML Framework for Predicting Atmospheric Absorption Spectra

Vasuda Trehan, Kevin H. Knuth, M. J. Way

TL;DR

The paper tackles the challenge of predicting exoplanet atmospheric absorption spectra from roughly 30 planetary parameters to enable Bayesian parameter inference. It proposes a forward surrogate based on PCHIP splines, trained on real Earth spectra and ROCKE-3D synthetic data, and demonstrates a proof-of-concept in a 1D, 1-parameter, six-bin setting, where each bin height $h_b$ depends on the parameters via $h_b = F_b(p_1, \dots, p_{30})$. Nested sampling is used to infer spline knot positions and heights, and Bayesian Adaptive Exploration identifies informative sampling locations to reduce predictive uncertainty. The results show accurate recovery of the spectral shapes with quantified uncertainty, while highlighting scaling challenges to the full 30-parameter, 20-bin problem and the need for more scalable forward models.

Abstract

The evolution of space technology in recent years, fueled by advancements in computing such as Artificial Intelligence (AI) and machine learning (ML), has profoundly transformed our capacity to explore the cosmos. Missions like the James Webb Space Telescope (JWST) have made information about distant objects more easily accessible, resulting in extensive amounts of valuable data. As part of this work-in-progress study, we are working to create an atmospheric absorption spectrum prediction model for exoplanets. The eventual model will be based on both collected observational spectra and synthetic spectral data generated by the ROCKE-3D general circulation model (GCM) developed by the climate modeling program at NASA's Goddard Institute for Space Studies (GISS). In this initial study, spline curves are used to describe the bin heights of simulated atmospheric absorption spectra as a function of one of the values of the planetary parameters. Bayesian Adaptive Exploration is then employed to identify areas of the planetary parameter space for which more data are needed to improve the model. The resulting system will be used as a forward model so that planetary parameters can be inferred given a planet's atmospheric absorption spectrum. This work is expected to contribute to a better understanding of exoplanetary properties and general exoplanet climates and habitability.

Understanding Exoplanet Habitability: A Bayesian ML Framework for Predicting Atmospheric Absorption Spectra

TL;DR

The paper tackles the challenge of predicting exoplanet atmospheric absorption spectra from roughly 30 planetary parameters to enable Bayesian parameter inference. It proposes a forward surrogate based on PCHIP splines, trained on real Earth spectra and ROCKE-3D synthetic data, and demonstrates a proof-of-concept in a 1D, 1-parameter, six-bin setting, where each bin height depends on the parameters via . Nested sampling is used to infer spline knot positions and heights, and Bayesian Adaptive Exploration identifies informative sampling locations to reduce predictive uncertainty. The results show accurate recovery of the spectral shapes with quantified uncertainty, while highlighting scaling challenges to the full 30-parameter, 20-bin problem and the need for more scalable forward models.

Abstract

The evolution of space technology in recent years, fueled by advancements in computing such as Artificial Intelligence (AI) and machine learning (ML), has profoundly transformed our capacity to explore the cosmos. Missions like the James Webb Space Telescope (JWST) have made information about distant objects more easily accessible, resulting in extensive amounts of valuable data. As part of this work-in-progress study, we are working to create an atmospheric absorption spectrum prediction model for exoplanets. The eventual model will be based on both collected observational spectra and synthetic spectral data generated by the ROCKE-3D general circulation model (GCM) developed by the climate modeling program at NASA's Goddard Institute for Space Studies (GISS). In this initial study, spline curves are used to describe the bin heights of simulated atmospheric absorption spectra as a function of one of the values of the planetary parameters. Bayesian Adaptive Exploration is then employed to identify areas of the planetary parameter space for which more data are needed to improve the model. The resulting system will be used as a forward model so that planetary parameters can be inferred given a planet's atmospheric absorption spectrum. This work is expected to contribute to a better understanding of exoplanetary properties and general exoplanet climates and habitability.

Paper Structure

This paper contains 7 sections, 9 equations, 6 figures.

Figures (6)

  • Figure S1: (A). The machine learning system will be initially trained on recorded present spectra and historic synthetic spectra generated by ROCKE-3D GCM simulations at NASA's GISS. (B). Once trained, the system can serve as a predictive forward model that will enable a Bayesian inference engine to estimate planetary parameters from recorded exoplanetary spectra.
  • Figure S2: (A). A synthetic atmospheric absorption spectrum, ranging from the visible (VIS) range through to near-infrared (NIR), of Archean snowball Earth generated using ROCKE-3D Way+etal:2023. (B). A summary of the spectrum as a set of 20 discrete bins.
  • Figure S3: An illustration of the synthetic spectrum. There are six synthetic spectral bins at x-values: $x = \{0.05, 0.30, 0.35, 0.65, 0.70, 0.95\}$. The spectrum is represented by six spectral bins with amplitudes defined by the functions illustrated by the corresponding curves.
  • Figure S4: (A--F). An illustration of the estimated mean functions (dashed curves) and standard deviations (colored/shaded regions) for the six spectral bins. The (truth) function from which the data were generated is shown as the solid line curve (see Figure \ref{['fig-3']}). The solid vertical lines indicate the positions of the data. The red vertical dashed lines illustrate the locations of greatest uncertainty, which indicate where further spectral measurements would be most informative. (G). An illustration the sum of the squared residuals between the estimated and true functional relationship. (H). An illustration of the sum of the squared differences (summed over bins) between the estimated functions and the true functions. Larger values indicate regions of (overall) greater uncertainty proportional to the information that is to be gained by obtaining data there.
  • Figure S5: (A--F). An illustration of the estimated mean functions (dashed lines) and standard deviations (colored/shaded regions) for the six spectral bins with an additional data spectrum at the location $x = 0.85$, indicated by the black vertical dotted line. (G) illustrates that the fit has been improved in the region where $x > 0.85$. (H). The sum of the squared deviations shows that collecting data near $x = 0.15$ and $x = 0.51$ would still be most informative, despite the fact the shape and positions of these peaks have shifted slightly, as indicated by the fact that the red vertical dashed lines (indicating the initial positions expected to be most informative) have now slightly deviated from the peaks of the sum squared deviations.
  • ...and 1 more figures