Table of Contents
Fetching ...

Using machine learning method for variable star classification using the TESS Sectors 1-57 data

Li-Heng Wang, Kai Li, Xiang Gao, Ya-Ni Guo, Guo-You Sun

TL;DR

This work tackles large-scale automated classification of variable stars in TESS 2-minute data (Sectors 1-57) by leveraging Gaia DR3 labels and an interpretable feature set derived from Fourier analysis and phase-folded light curves. A two-stage Random Forest pipeline performs coarse classification into four main types ($EB_s$, pulsations, ROT, non-variables) followed by per-category subclassification, aided by a robust period determination via Generalized Lomb-Scargle ($GLS$) and careful feature extraction. The approach achieves an $OOB$ score of $0.9178$ and produces seven-variable catalogs (EA, EW, CEP, DSCT, RRab, RRc, ROT) with 14092 new discoveries, including 6245 new EB_s; results are validated through visual inspection and cross-matching with Gaia and external catalogs. The dataset-scale, interpretable methodology demonstrates practical potential for building comprehensive variable-star catalogs from space-based surveys, while acknowledging labeling and data-heterogeneity limitations that guide future refinements.

Abstract

The Transiting Exoplanet Survey Satellite (TESS) is a wide-field all-sky survey mission designed to detect Earth-sized exoplanets. After over four years photometric surveys, data from sectors 1-57, including approximately 1,050,000 light curves with a 2-minute cadence, were collected. By cross-matching the data with Gaia's variable star catalogue, we obtained labeled datasets for further analysis. Using a random forest classifier, we performed classification of variable stars and designed distinct classification processes for each subclass, 6770 EA, 2971 EW, 980 CEP, 8347 DSCT, 457 RRab, 404 RRc and 12348 ROT were identified. Each variable star was visually inspected to ensure the reliability and accuracy of the compiled catalog. Subsequently, we ultimately obtained 6046 EA, 3859 EW, 2058 CEP, 8434 DSCT, 482 RRab, 416 RRc, and 9694 ROT, and a total of 14092 new variable stars were discovered.

Using machine learning method for variable star classification using the TESS Sectors 1-57 data

TL;DR

This work tackles large-scale automated classification of variable stars in TESS 2-minute data (Sectors 1-57) by leveraging Gaia DR3 labels and an interpretable feature set derived from Fourier analysis and phase-folded light curves. A two-stage Random Forest pipeline performs coarse classification into four main types (, pulsations, ROT, non-variables) followed by per-category subclassification, aided by a robust period determination via Generalized Lomb-Scargle () and careful feature extraction. The approach achieves an score of and produces seven-variable catalogs (EA, EW, CEP, DSCT, RRab, RRc, ROT) with 14092 new discoveries, including 6245 new EB_s; results are validated through visual inspection and cross-matching with Gaia and external catalogs. The dataset-scale, interpretable methodology demonstrates practical potential for building comprehensive variable-star catalogs from space-based surveys, while acknowledging labeling and data-heterogeneity limitations that guide future refinements.

Abstract

The Transiting Exoplanet Survey Satellite (TESS) is a wide-field all-sky survey mission designed to detect Earth-sized exoplanets. After over four years photometric surveys, data from sectors 1-57, including approximately 1,050,000 light curves with a 2-minute cadence, were collected. By cross-matching the data with Gaia's variable star catalogue, we obtained labeled datasets for further analysis. Using a random forest classifier, we performed classification of variable stars and designed distinct classification processes for each subclass, 6770 EA, 2971 EW, 980 CEP, 8347 DSCT, 457 RRab, 404 RRc and 12348 ROT were identified. Each variable star was visually inspected to ensure the reliability and accuracy of the compiled catalog. Subsequently, we ultimately obtained 6046 EA, 3859 EW, 2058 CEP, 8434 DSCT, 482 RRab, 416 RRc, and 9694 ROT, and a total of 14092 new variable stars were discovered.

Paper Structure

This paper contains 12 sections, 3 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Confusion matrix for training a RF classifier with labeled data, with the x-axis being the predicted category and the y-axis being the input category
  • Figure 2: The confusion matrix we got after taking all labeled data (except non-variable stars) as input.
  • Figure 3: Statistical chart of $c\_bin$ parameters (normalize the light curves and then count the number of points with values below 0.5). The abscissa is the parameter value and the ordinate is the statistical number.
  • Figure 4: This figure is the statistical distribution diagram of amplitude.
  • Figure 5: The typical light curve for each category, from left to right: the original light curve, phase folded curve, and GLS periodogram.
  • ...and 7 more figures