Table of Contents
Fetching ...

Estimating the completeness of the QUBRICS Survey with 3501 QSO redshifts from Gaia DR3 spectra

Matteo Porru, Stefano Cristiani, Francesco Guarneri, Giorgio Calderone, Andrea Grazian, Konstantina Boutsia, Andrea Trost, Valentina D'Odorico, Guido Cupani, Catarina M. J. Marques, Francesco Chiti Tegli, Fabio Fontanot

Abstract

QSOs are essential for investigating the structure and evolution of the Universe. Historically, their identification has been concentrated in the northern hemisphere, primarily due to the sky coverage of major astronomical surveys. The QUBRICS survey, started in 2019 to address this asymmetry, has identified more than 1300 new bright (i<19.5) high-redshift (2.5<z<6) QSOs in the southern sky. We aim to quantify, using an independent QSO sample, the completeness and recall of the QUBRICS QSO selection methods, based on XGB (eXtreme Gradient Boosting) and PRF (Probabilistic Random Forest), since completeness is a fundamental metric for ensuring the statistical robustness of QSO-based cosmological investigations. A subset of Gaia DR3 sources with low-resolution spectra was analyzed, obtaining a sample of 3501 QSOs. To determine how many QSOs were correctly identified as candidates, we crossmatched this independent sample with the datasets used for selection: 894 QSOs with z>2.5 fell within the XGB dataset footprint, of which 152 were unclassified and thus eligible for completeness testing. Similarly, 675 QSOs with z>2.5 were within the PRF dataset footprint, including 69 unclassified objects. The XGB correctly identified as candidates 136 (89%) of the 152 QSOs with z>2.5 present in its dataset as unclassified objects. The PRF correctly identified as candidates 46 (66%) of the 69 QSOs with z>2.5 present in its dataset as unclassified objects. These findings confirm the high efficiency of the QUBRICS selection methods (recall=89%) and provide the completeness estimate for spectroscopically confirmed QSOs (82%), necessary for cosmological studies using QUBRICS data. This work also provides reliable redshifts for 1223 new QSOs (median redshift z=2.1 and magnitude G=17.8), that will help improve the performance of future selections.

Estimating the completeness of the QUBRICS Survey with 3501 QSO redshifts from Gaia DR3 spectra

Abstract

QSOs are essential for investigating the structure and evolution of the Universe. Historically, their identification has been concentrated in the northern hemisphere, primarily due to the sky coverage of major astronomical surveys. The QUBRICS survey, started in 2019 to address this asymmetry, has identified more than 1300 new bright (i<19.5) high-redshift (2.5<z<6) QSOs in the southern sky. We aim to quantify, using an independent QSO sample, the completeness and recall of the QUBRICS QSO selection methods, based on XGB (eXtreme Gradient Boosting) and PRF (Probabilistic Random Forest), since completeness is a fundamental metric for ensuring the statistical robustness of QSO-based cosmological investigations. A subset of Gaia DR3 sources with low-resolution spectra was analyzed, obtaining a sample of 3501 QSOs. To determine how many QSOs were correctly identified as candidates, we crossmatched this independent sample with the datasets used for selection: 894 QSOs with z>2.5 fell within the XGB dataset footprint, of which 152 were unclassified and thus eligible for completeness testing. Similarly, 675 QSOs with z>2.5 were within the PRF dataset footprint, including 69 unclassified objects. The XGB correctly identified as candidates 136 (89%) of the 152 QSOs with z>2.5 present in its dataset as unclassified objects. The PRF correctly identified as candidates 46 (66%) of the 69 QSOs with z>2.5 present in its dataset as unclassified objects. These findings confirm the high efficiency of the QUBRICS selection methods (recall=89%) and provide the completeness estimate for spectroscopically confirmed QSOs (82%), necessary for cosmological studies using QUBRICS data. This work also provides reliable redshifts for 1223 new QSOs (median redshift z=2.1 and magnitude G=17.8), that will help improve the performance of future selections.
Paper Structure (11 sections, 8 equations, 8 figures, 3 tables)

This paper contains 11 sections, 8 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The Gaia G magnitude of the 3501 QSOs of the independent sample plotted vs the $z_{\rm QU\_G}$ redshifts determined on the basis of the Gaia low-resolution spectra. 2278 previously known QSOs are highlighted in blue, while 1223 newly identified QSOs are highlighted in red.
  • Figure 2: The difference between the redshifts determined on the basis of the Gaia low-resolution spectra and the spectroscopic redshifts, $\Delta z$, as a function of the spectroscopic redshift. The dashed red lines mark the $\Delta_z=\pm5\sigma_z$ threshold chosen to identify catastrophic discrepancies.
  • Figure 3: Top panel: histogram of the Gaia G magnitude for the 152 QSOs with no classification in the XGB sample, with the 16 QSOs that were not identified as candidates ("missed") highlighted in red. Bottom panel: histogram of the $z_{\rm QU\_G}$ redshifts for the 152 QSOs with no classification in the XGB sample, with the 16 QSOs that were not identified as candidates ("Missed") highlighted in red.
  • Figure 4: Top panel: histogram of the Gaia G magnitude for the 69 QSOs with no classification in the PRF sample, with the 23 QSOs that were not identified as candidates ("missed") highlighted in red. Bottom panel: histogram of the $z_{\rm QU\_G}$ redshifts for the 69 QSOs with no classification in the PRF sample, with the 23 QSOs that were not identified as candidates ("Missed") highlighted in red.
  • Figure 5: Schematic representation of the datasets used in the QUBRICS survey and in this paper. The outermost black rectangle contains all the sources within a given footprint and magnitude range. The green dashed rectangle denotes the dataset used in the XGB selection. Vertical divisions separate the sources according to their true category, while horizontal divisions separate the sources according to their label in the database. Known uninteresting sources (stars, galaxies, low-redshift QSOs) are in the bottom left quadrant; known high-redshift QSOs are in the bottom right quadrant; unclassified sources that are stars, galaxies or low-redshift QSOs are in the top left quadrant; unclassified sources that are high-redshift QSOs are in the top right. The region of interest is highlighted in cyan and by the letter U: this is the set of of all true high-redshift QSOs that are unclassified in the dataset. The red rectangle represents the set of QSO candidates predicted by the XGB, and its intersection C with the U region is the set of unclassified high-redshift QSOs that are also candidates. The blue rectangle represents the set of Gaia QSOs, and its intersection G with the cyan U region is the set of Gaia QSOs that are unclassified in the dataset.
  • ...and 3 more figures