Table of Contents
Fetching ...

Machine learning-based identification of Gaia astrometric exoplanet orbits

Johannes Sahlmann, Pablo Gómez

TL;DR

This work developed an alternative machine learning approach that uses only the Gaia DR3 orbital solutions with the aim of identifying the best candidates for exoplanets and brown-dwarf companions, and produced a list of 20 best candidates of which two are exoplanet candidates and another five are either very-massive brown dwarfs or very-low mass stars.

Abstract

The third Gaia data release (DR3) contains $\sim$170\,000 astrometric orbit solutions of two-body systems located within $\sim$500 pc of the Sun. Determining component masses in these systems, in particular of stars hosting exoplanets, usually hinges on incorporating complementary observations in addition to the astrometry, e.g. spectroscopy and radial velocities. Several Gaia DR3 two-body systems with exoplanet, brown-dwarf, stellar, and black-hole components have been confirmed in this way. We developed an alternative machine learning approach that uses only the Gaia DR3 orbital solutions with the aim of identifying the best candidates for exoplanets and brown-dwarf companions. Based on confirmed substellar companions in the literature, we use semi-supervised anomaly detection methods in combination with extreme gradient boosting and random forest classifiers to determine likely low-mass outliers in the population of non-single sources. We employ and study feature importance to investigate the method's plausibility and produced a list of 20 best candidates of which two are exoplanet candidates and another five are either very-massive brown dwarfs or very-low mass stars. Three candidates, including one initial exoplanet candidate, correspond to false-positive solutions where longer-period binary star motion was fitted with a biased shorter-period orbit. We highlight nine candidates with brown-dwarf companions for preferential follow-up. The companion around the Sun-like star G\,15-6 could be confirmed as a genuine brown dwarf using external radial-velocity data. This new approach is a powerful complement to the traditional identification methods for substellar companions among Gaia astrometric orbits. It is particularly relevant in the context of Gaia DR4 and its expected exoplanet discovery yield.

Machine learning-based identification of Gaia astrometric exoplanet orbits

TL;DR

This work developed an alternative machine learning approach that uses only the Gaia DR3 orbital solutions with the aim of identifying the best candidates for exoplanets and brown-dwarf companions, and produced a list of 20 best candidates of which two are exoplanet candidates and another five are either very-massive brown dwarfs or very-low mass stars.

Abstract

The third Gaia data release (DR3) contains 170\,000 astrometric orbit solutions of two-body systems located within 500 pc of the Sun. Determining component masses in these systems, in particular of stars hosting exoplanets, usually hinges on incorporating complementary observations in addition to the astrometry, e.g. spectroscopy and radial velocities. Several Gaia DR3 two-body systems with exoplanet, brown-dwarf, stellar, and black-hole components have been confirmed in this way. We developed an alternative machine learning approach that uses only the Gaia DR3 orbital solutions with the aim of identifying the best candidates for exoplanets and brown-dwarf companions. Based on confirmed substellar companions in the literature, we use semi-supervised anomaly detection methods in combination with extreme gradient boosting and random forest classifiers to determine likely low-mass outliers in the population of non-single sources. We employ and study feature importance to investigate the method's plausibility and produced a list of 20 best candidates of which two are exoplanet candidates and another five are either very-massive brown dwarfs or very-low mass stars. Three candidates, including one initial exoplanet candidate, correspond to false-positive solutions where longer-period binary star motion was fitted with a biased shorter-period orbit. We highlight nine candidates with brown-dwarf companions for preferential follow-up. The companion around the Sun-like star G\,15-6 could be confirmed as a genuine brown dwarf using external radial-velocity data. This new approach is a powerful complement to the traditional identification methods for substellar companions among Gaia astrometric orbits. It is particularly relevant in the context of Gaia DR4 and its expected exoplanet discovery yield.
Paper Structure (20 sections, 1 equation, 15 figures, 8 tables)

This paper contains 20 sections, 1 equation, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Overview of the data and different partitionings (Datasets D1, D2, D3, D4) used. Source preselection was performed as described in Section \ref{['label_definition']}.
  • Figure 2: Overview of the system architecture used for identifying candidates.
  • Figure 3: Density histograms of absolute magnitude as a function of colour (top panel, the inset show a zoom into the region of interest) and mass function (bottom) for all Gaia DR3 astrometric orbits. The blue contours indicate the concentration of the solutions labelled as preselected sources. Circles indicate the 10 confirmed exoplanets, squares indicate the 14 confirmed BD-companions, and crosses indicate our 22 best candidates for substellar companions discussed in the text. For reference, the Sun has $M_G=4.67$ and $G_\mathrm{BP}-G_\mathrm{RP}=0.82$2018MNRAS.479L.102C, and the mass function of a $5\,M_J$ planet in a Jupiter-like orbit around the Sun is $1.1 \cdot 10^{-7}$. These reference locations are marked with a yellow diamond. The exoplanet with the largest mass function is HD 39392 b Wilson:2016aa2023MNRAS.526.5155S.
  • Figure 4: SHAP value distribution for the top five features, i.e. the input parameters such as radial velocity error, of the top 50 sources identified in each of the 8 configurations (total number of 8$\times$50 sources); colour indicates distribution of the feature; especially smaller values of the variables are associated with higher SHAP values. The top five features are determined as having the highest average over the entire sample.
  • Figure 5: SHAP value distribution for the top five features, i.e. the input parameters such as radial velocity error, in a random sample of 50 sources per configuration; colour indicates distribution of the feature; especially larger values of the variables are associated with smaller SHAP values. The top five features are determined as having the highest average over the entire sample.
  • ...and 10 more figures