Table of Contents
Fetching ...

Utilizing Machine Learning to Predict Host Stars and the Key Elemental Abundances of Small Planets

Amílcar R. Torres-Quijano, Natalie R. Hinkel, Caleb H. Wheeler, Patrick A. Young, Luan Ghezzi, Augusto P. Baldo

TL;DR

The paper addresses how stellar chemical abundances relate to the presence of small planets, proposing a supervised learning approach to identify predictive elemental features. Using XGBoost on abundances from the Hypatia Catalog and planet data from the NASA Exoplanet Archive, the authors define three planet classes and seven ensembles to extract robust chemical signals, with a Golden Set to quantify predictive power. They consistently find Na and V as key features across regimes, while Mg-related ratios provide additional predictive power; overlaps between ensembles reveal a potential planet-forming chemical recipe and sub-solar Na enrichment trends among predicted hosts. The results demonstrate the feasibility of ML-driven host-star targeting to optimize future mission yields (e.g., JWST, NGRST, HWO) and offer insights into how stellar chemistry translates into planet formation, while noting data biases and uncertainties that warrant cautious interpretation and further refinement.

Abstract

Stars and their associated planets originate from the same cloud of gas and dust, making a star's elemental composition a valuable indicator for indirectly studying planetary compositions. While the connection between a star's iron (Fe) abundance and the presence of giant exoplanets is established (e.g. Gonzalez 1997; Fischer & Valenti 2005), the relationship with small planets remains unclear. The elements Mg, Si, and Fe are important in forming small planets. Employing machine learning algorithms like XGBoost, trained on the abundances (e.g., the Hypatia Catalog, Hinkel et al. 2014) of known exoplanet-hosting stars (NASA Exoplanet Archive), allows us to determine significant "features" (abundances or molar ratios) that may indicate the presence of small planets. We test on three groups of exoplanets: (a) all small, R$_{P}$ $<$ 3.5 $R_{\oplus}$, (b) sub-Neptunes, 2.0 $R_{\oplus}$ $<$ R$_{P}$ $<$ 3.5 $R_{\oplus}$, and (c) super-Earths, 1.0 $R_{\oplus}$ $<$ R$_{P}$ $<$ 2.0 $R_{\oplus}$ -- each subdivided into 7 ensembles to test different combinations of features. We created a list of stars with $\geq90\%$ probability of hosting small planets across all ensembles and experiments ("overlap stars"). We found abundance trends for stars hosting small planets, possibly indicating star-planet chemical interplay during formation. We also found that Na and V are key features regardless of planetary radii. We expect our results to underscore the importance of elements in exoplanet formation and machine learning's role in target selection for future NASA missions: e.g., the James Webb Space Telescope (JWST), Nancy Grace Roman Space Telescope (NGRST), Habitable Worlds Observatory (HWO) -- all of which are aimed at small planet detection.

Utilizing Machine Learning to Predict Host Stars and the Key Elemental Abundances of Small Planets

TL;DR

The paper addresses how stellar chemical abundances relate to the presence of small planets, proposing a supervised learning approach to identify predictive elemental features. Using XGBoost on abundances from the Hypatia Catalog and planet data from the NASA Exoplanet Archive, the authors define three planet classes and seven ensembles to extract robust chemical signals, with a Golden Set to quantify predictive power. They consistently find Na and V as key features across regimes, while Mg-related ratios provide additional predictive power; overlaps between ensembles reveal a potential planet-forming chemical recipe and sub-solar Na enrichment trends among predicted hosts. The results demonstrate the feasibility of ML-driven host-star targeting to optimize future mission yields (e.g., JWST, NGRST, HWO) and offer insights into how stellar chemistry translates into planet formation, while noting data biases and uncertainties that warrant cautious interpretation and further refinement.

Abstract

Stars and their associated planets originate from the same cloud of gas and dust, making a star's elemental composition a valuable indicator for indirectly studying planetary compositions. While the connection between a star's iron (Fe) abundance and the presence of giant exoplanets is established (e.g. Gonzalez 1997; Fischer & Valenti 2005), the relationship with small planets remains unclear. The elements Mg, Si, and Fe are important in forming small planets. Employing machine learning algorithms like XGBoost, trained on the abundances (e.g., the Hypatia Catalog, Hinkel et al. 2014) of known exoplanet-hosting stars (NASA Exoplanet Archive), allows us to determine significant "features" (abundances or molar ratios) that may indicate the presence of small planets. We test on three groups of exoplanets: (a) all small, R 3.5 , (b) sub-Neptunes, 2.0 R 3.5 , and (c) super-Earths, 1.0 R 2.0 -- each subdivided into 7 ensembles to test different combinations of features. We created a list of stars with probability of hosting small planets across all ensembles and experiments ("overlap stars"). We found abundance trends for stars hosting small planets, possibly indicating star-planet chemical interplay during formation. We also found that Na and V are key features regardless of planetary radii. We expect our results to underscore the importance of elements in exoplanet formation and machine learning's role in target selection for future NASA missions: e.g., the James Webb Space Telescope (JWST), Nancy Grace Roman Space Telescope (NGRST), Habitable Worlds Observatory (HWO) -- all of which are aimed at small planet detection.

Paper Structure

This paper contains 18 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Feature importance plots obtained for the ensembles in Experiment 1 (Small Planets). Because the PlanetPrediction algorithm performs thousands of iterations over which the feature importance scores may vary. We have employed a color coding and pattern schema to identify feature importance score variations. Blue coloration denotes that a particular element did not have any variations in importance score across initializations during an ensemble run. In contrast, colors other than blue indicate that those elements varied in feature importance. The hatching patterns within the lower panels (c) and (d) indicate additional smaller (sub-group) variations that occurred within the ensembles.
  • Figure 2: Similar to Fig. \ref{['fig:feature-importances-ex1']} but for the molar ratios tested in Experiment 1. Given that Fe/Mg is the most important features for Ensemble 5, this implies that the size of a small planet's core (Fe/Mg) could be an important factor in determining the presence of a small planet (see Section \ref{['sec:data']}). The molar ratios of Mg/Si and Fe/Si are often the most important features in Ensemble 6, while Mg/O is the most significant in determining the presence of a small planet in Ensemble 7.
  • Figure 3: Same as Fig. \ref{['fig:feature-importances-ex1']} but for the abundance feature importance plots obtained for the Experiment 2 (sub-Neptune) ensembles. Ensembles 1 (a) shows Mg as the most important feature for the sub-Neptunes while Mg and Fe's alternate as the most important feature in Ensemble 2 (b). For Ensemble 3 (c) and Ensemble 4 (d), Na and V -- and to a lesser extent Ni -- are the most important elements for this particular class of planets, similar to Experiment 1.
  • Figure 4: Similar to Fig. \ref{['fig:feature-importances-ex1']} but for the molar ratio feature importance plots obtained for the Experiment 2 (sub-Neptune) ensembles. Ensemble 5 (a) shows variations in the molar ratios of Ti/Mg and Ca/Mg, as well as C/Mg and Si/Mg. Ensemble 6 (b) exhibits variations among multiple molar ratios (Fe/Si, Ti/Si, Mg/Si, and Ca/Si) to determine the most significant molar ratio in assessing the presence of a sub-Neptune.
  • Figure 5: Same as Fig. \ref{['fig:feature-importances-ex1']} but for the abundance feature importance plots obtained for the Experiment 3 (super-Earth) ensembles. While C becomes the second most important feature in Ensemble 2 (b), Ensemble 3 (c) and Ensemble 4 (d), and the third most important feature in Ensemble 1 (a), this result is due to the use of null values within the ensembles as discussed in section \ref{['subsec:nullvalues']}. Therefore, the feature importance of C is not heavily considered. We note that Na remains the most important feature for Ensembles 3 (c) and 4 (d), while V remains near the top (although alternating) and Ni is now much further down. The elements V and Ti vary in importance for Ensemble 3 (c) while Al remains the fourth most important. A similar situation occurs where Al and Ti vary in importance for Ensemble 4 (d) while V remains the third most important.
  • ...and 4 more figures