Table of Contents
Fetching ...

Earth-like planet predictor: A machine learning approach

Jeanne Davoult, Romain Eltschinger, Yann Alibert

TL;DR

This work tackles the exoplanet search bottleneck by predicting which stars are most likely to host an Earth-like planet (ELP) using a Random Forest classifier trained on thousands of synthetic planetary systems from the Bern model. By imposing a simple radial-velocity bias and comparing multiple feature strategies, the authors achieve high precision ($\approx 0.99$) on synthetic tests and identify a set of real systems (about $44$) with architectures conducive to stable ELPs, validated through Hill-stability checks. The approach demonstrates the viability of theory-guided machine learning to prioritize observations for future missions like PLATO and LIFE, while acknowledging limitations of the Bern model and biases that warrant further refinement and cross-validation with alternative formation models.

Abstract

Searching for planets analogous to Earth in terms of mass and equilibrium temperature is currently the first step in the quest for habitable conditions outside our Solar System and, ultimately, the search for life in the universe. Future missions such as PLATO or LIFE will begin to detect and characterise these small, cold planets, dedicating significant observation time to them. The aim of this work is to predict which stars are most likely to host an Earth-like planet (ELP) to avoid blind searches, minimises detection times, and thus maximises the number of detections. Using a previous study on correlations between the presence of an ELP and the properties of its system, we trained a Random Forest to recognise and classify systems as 'hosting an ELP' or 'not hosting an ELP'. The Random Forest was trained and tested on populations of synthetic planetary systems derived from the Bern model, and then applied to real observed systems. The tests conducted on the machine learning (ML) model yield precision scores of up to 0.99, indicating that 99% of the systems identified by the model as having ELPs possess at least one. Among the few real observed systems that have been tested, 44 have been selected as having a high probability of hosting an ELP, and a quick study of the stability of these systems confirms that the presence of an Earth-like planet within them would leave them stable. The excellent results obtained from the tests conducted on the ML model demonstrate its ability to recognise the typical architectures of systems with or without ELPs within populations derived from the Bern model. If we assume that the Bern model adequately describes the architecture of real systems, then such a tool can prove indispensable in the search for Earth-like planets. A similar approach could be applied to other planetary system formation models to validate those predictions.

Earth-like planet predictor: A machine learning approach

TL;DR

This work tackles the exoplanet search bottleneck by predicting which stars are most likely to host an Earth-like planet (ELP) using a Random Forest classifier trained on thousands of synthetic planetary systems from the Bern model. By imposing a simple radial-velocity bias and comparing multiple feature strategies, the authors achieve high precision () on synthetic tests and identify a set of real systems (about ) with architectures conducive to stable ELPs, validated through Hill-stability checks. The approach demonstrates the viability of theory-guided machine learning to prioritize observations for future missions like PLATO and LIFE, while acknowledging limitations of the Bern model and biases that warrant further refinement and cross-validation with alternative formation models.

Abstract

Searching for planets analogous to Earth in terms of mass and equilibrium temperature is currently the first step in the quest for habitable conditions outside our Solar System and, ultimately, the search for life in the universe. Future missions such as PLATO or LIFE will begin to detect and characterise these small, cold planets, dedicating significant observation time to them. The aim of this work is to predict which stars are most likely to host an Earth-like planet (ELP) to avoid blind searches, minimises detection times, and thus maximises the number of detections. Using a previous study on correlations between the presence of an ELP and the properties of its system, we trained a Random Forest to recognise and classify systems as 'hosting an ELP' or 'not hosting an ELP'. The Random Forest was trained and tested on populations of synthetic planetary systems derived from the Bern model, and then applied to real observed systems. The tests conducted on the machine learning (ML) model yield precision scores of up to 0.99, indicating that 99% of the systems identified by the model as having ELPs possess at least one. Among the few real observed systems that have been tested, 44 have been selected as having a high probability of hosting an ELP, and a quick study of the stability of these systems confirms that the presence of an Earth-like planet within them would leave them stable. The excellent results obtained from the tests conducted on the ML model demonstrate its ability to recognise the typical architectures of systems with or without ELPs within populations derived from the Bern model. If we assume that the Bern model adequately describes the architecture of real systems, then such a tool can prove indispensable in the search for Earth-like planets. A similar approach could be applied to other planetary system formation models to validate those predictions.

Paper Structure

This paper contains 19 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Representation of 16 systems with ELP (left) and 16 systems without ELP (right) in a semi-major axis - planetary mass diagram (in log scale for both axes). Blue dots represent 'detectable' planets and yellow dots 'undetectable' planets.
  • Figure 2: Bee swarm plot of the seven features considered. The x-axis represents the SHAP value of the feature for each instance, and the y-axis represents the seven features considered ranked from the most important (top) to the least (bottom). The colour of the dots represents the value of the feature itself, red being high values and blue being low values.
  • Figure 3: Systems around G stars with a resulting voting rate above 90%. The green areas represent the definition of an Earth-like planet in the study in terms of equilibrium temperature and mass. The grey areas represent the combinations of mass and semi-major axis for which the Hill-stability criterion is met with the addition of a new planet. The black dots correspond to planets for which we know the mass, and the orange dots correspond to planet for which we only know the radius, and the mass has been derived thanks to the work of Parc2024.
  • Figure 4: Systems around early-M and late-K stars with a resulting voting rate above 90%. The green areas represent the definition of an Earth-like planet in the study in terms of equilibrium temperature and mass. The grey areas represent the combinations of mass and semi-major axis for which the Hill-stability criterion is met with the addition of a new planet. The black dots correspond to the planets already known in these systems.
  • Figure 5: Systems around late-M stars with a resulting voting rate above 90%. The green areas represent the definition of an Earth-like planet in the study in terms of equilibrium temperature and mass. The grey areas represent the combinations of mass and semi-major axis for which the Hill-stability criterion is met with the addition of a new planet. The dots represent the planets already known in those systems: the black dots for planets with a RV semi-amplitude above the threshold of detection bias (detectable planets) and the grey dots for the planets with a RV semi-amplitude below this threshold. Only the detectable planets count in the calculation of the architecture of the systems.