QHSC: The Quasar Candidate Catalog for the Hyper Suprime-Cam Subaru Strategic Program
Rui Zhu, Xue-Bing Wu, Yuxuan Pang, Yuming Fu
TL;DR
The paper presents QHSC, a deep, ML-driven catalog of quasar candidates in the HSC-SSP Wide survey built from four photometric parent samples and evaluated with multiple deep spectroscopic datasets. It employs XGBoost classifiers for quasar selection and a bagging-XGBoost regressor for photometric redshift estimation, achieving high completeness (>$85\%$) and substantial purity, especially when mid-infrared data from WISE are included. Near-infrared data from UKIDSS/VISTA and, to a lesser extent, SCUSS $u$-band data further improve redshift estimates and reduce catastrophic outliers to around $\sim 10\%$ in optimized samples. The resulting QHSC catalog, publicly available, supports studies of quasars and cosmology and demonstrates the viability of ensemble ML approaches for quasar selection in upcoming wide/deep imaging surveys.
Abstract
The Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) is a deep wide-field multi-band imaging survey consisting of three layers (Wide, Deep, and UltraDeep), with the Wide layer covering $\sim 1470$ deg$^2$ to a depth of $i \sim 26$ mag. We present the QHSC catalog, a machine-learning selected sample of quasar candidates with photometric redshifts in the Wide layer of the HSC-SSP survey (Public Data Release 3). The full QHSC catalog contains four distinct samples: a master sample with HSC-only photometry, an HSC+WISE sample, and two samples including near-infrared data from UKIDSS and VISTA, denoted as GoldenU and GoldenV. For each sample, an XGBoost classifier is trained and evaluated using independent spectroscopic test sets from HETDEX, VVDS, and zCOSMOS-bright. The numbers of quasar candidates in the QHSC catalog are 1,184,574 (master), 371,777 (HSC+WISE), 87,460 (GoldenU), and 120,572 (GoldenV), with respective completeness values of 85.3%, 92.7%, 89.8%, and 91.3%. We develop ensemble photometric redshift estimators based on bootstrap aggregating (bagging) of multiple XGBoost regressors, achieving outlier fractions of 21.7%, 13.1%, 9.5%, and 10.7% for these samples. The catalog provides quasar classification probabilities (p_QSO), enabling construction of purer subsamples via thresholding. This work offers a valuable resource for studies of quasars and cosmology, and highlights the effectiveness of machine learning for quasar selection in future wide and deep imaging surveys. The catalog is publicly available at https://doi.org/10.5281/zenodo.17515028.
