Utilizing Machine Learning to Predict Host Stars and the Key Elemental Abundances of Small Planets
Amílcar R. Torres-Quijano, Natalie R. Hinkel, Caleb H. Wheeler, Patrick A. Young, Luan Ghezzi, Augusto P. Baldo
TL;DR
The paper addresses how stellar chemical abundances relate to the presence of small planets, proposing a supervised learning approach to identify predictive elemental features. Using XGBoost on abundances from the Hypatia Catalog and planet data from the NASA Exoplanet Archive, the authors define three planet classes and seven ensembles to extract robust chemical signals, with a Golden Set to quantify predictive power. They consistently find Na and V as key features across regimes, while Mg-related ratios provide additional predictive power; overlaps between ensembles reveal a potential planet-forming chemical recipe and sub-solar Na enrichment trends among predicted hosts. The results demonstrate the feasibility of ML-driven host-star targeting to optimize future mission yields (e.g., JWST, NGRST, HWO) and offer insights into how stellar chemistry translates into planet formation, while noting data biases and uncertainties that warrant cautious interpretation and further refinement.
Abstract
Stars and their associated planets originate from the same cloud of gas and dust, making a star's elemental composition a valuable indicator for indirectly studying planetary compositions. While the connection between a star's iron (Fe) abundance and the presence of giant exoplanets is established (e.g. Gonzalez 1997; Fischer & Valenti 2005), the relationship with small planets remains unclear. The elements Mg, Si, and Fe are important in forming small planets. Employing machine learning algorithms like XGBoost, trained on the abundances (e.g., the Hypatia Catalog, Hinkel et al. 2014) of known exoplanet-hosting stars (NASA Exoplanet Archive), allows us to determine significant "features" (abundances or molar ratios) that may indicate the presence of small planets. We test on three groups of exoplanets: (a) all small, R$_{P}$ $<$ 3.5 $R_{\oplus}$, (b) sub-Neptunes, 2.0 $R_{\oplus}$ $<$ R$_{P}$ $<$ 3.5 $R_{\oplus}$, and (c) super-Earths, 1.0 $R_{\oplus}$ $<$ R$_{P}$ $<$ 2.0 $R_{\oplus}$ -- each subdivided into 7 ensembles to test different combinations of features. We created a list of stars with $\geq90\%$ probability of hosting small planets across all ensembles and experiments ("overlap stars"). We found abundance trends for stars hosting small planets, possibly indicating star-planet chemical interplay during formation. We also found that Na and V are key features regardless of planetary radii. We expect our results to underscore the importance of elements in exoplanet formation and machine learning's role in target selection for future NASA missions: e.g., the James Webb Space Telescope (JWST), Nancy Grace Roman Space Telescope (NGRST), Habitable Worlds Observatory (HWO) -- all of which are aimed at small planet detection.
