Identification of Gamma Ray Pulsar Candidates in the \emph{Fermi}-LAT 4FGL-DR4 Unassociated Sources Using Supervised Machine Learning
A. Pathania, K. K. Singh, S. K. Singh, A. Tolamatti, B. B. Singh, K. K. Yadav
TL;DR
This study tackles the challenge of labeling unassociated Fermi-LAT sources in the 4FGL-DR4 catalog as pulsars or AGN using supervised learning with Random Forest and Extreme Gradient Boosting. It derives a compact, informative feature set from long-term gamma-ray data, applying KS and Kendall-$\tau$ screening and Recursive Feature Elimination to select eight key features, including the curvature parameter $\beta_{LP}$ and spectral/temporal indicators such as $E_c^{PLEC}$ and HRs. The classifiers achieve high performance ($>97\%$ accuracy; $\sim99\%$ balanced accuracy for XGB with SMOTE) on known sources, and classify 2257 unassociated sources into pulsar, AGN, or ambiguous categories, identifying hundreds of pulsar candidates. Covariate-shift analysis reveals realistic performance declines on faint, newly identified sources, with XGB generally outperforming RF under shift; these results provide a practical resource for follow-up observations (radio and very-high-energy) and contribute to pulsar population studies, including predictions aligned with contemporary pulsar surveys like TRAPUM and FAST.
Abstract
The Large Area Telescope (LAT) on board the \emph{Fermi} Gamma-ray Space Telescope has been continuously providing good quality survey data of the entire sky in the high energy range from 30 MeV to 500 GeV and above since August 2008. A succession of gamma-ray source catalogs is published after a comprehensive analysis of the \emph{Fermi}--LAT data. The most recent release of data in the fourth \emph{Fermi}--LAT catalog of gamma-ray sources (4FGL-DR4), based on the first 14 years of observations in the energy band 50 MeV-1 TeV, contains 7195 sources. A large fraction ($\sim$ 33\%) of this population has no known counterparts in the lower wave bands. Such high energy gamma-ray sources are referred to as unassociated or unidentified. An appropriate classification of these objects into known type of gamma-ray sources such as the active galactic nuclei or pulsars is essential for population studies and pointed multi-wavelength observations to probe the radiative processes. In this work, we perform a detailed classification of the unassociated sources reported in the 4FGL-DR4 catalog using two supervised machine learning techniques-Random Forest and Extreme Gradient Boosting. We mainly focus on the identification of new gamma-ray pulsar candidates by making use of different observational features derived from the long-term observations with the \emph{Fermi}--LAT and reported in the incremental 4FGL-DR4 catalog. We also explore the effects of data balancing approach on the classification of the \emph{Fermi}--LAT unassociated sources.
