Table of Contents
Fetching ...

16 new quasars at the end of the reionization unveiled by self-supervised learning

L. N. Martínez-Ramírez, Julien Wolf, Silvia Belladitta, Eduardo Bañados, F. E. Bauer, Raphael E. Hviding, Daniel Stern, Chiara Mazzucchelli, Romain A. Meyer, Ezequiel Treister, Federica Loiacono

Abstract

Luminous quasars at $z > 6$ are key probes of early supermassive black hole (SMBH) growth, massive galaxy evolution, and intergalactic medium properties during cosmic reionization. However, their discovery is very challenging due to their scarcity and overwhelming contamination, as foreground ultracool dwarfs (UCDs) outnumber $z>6$ quasars by 2-4 orders of magnitude. In this work, we leverage the extensive coverage of DESI Legacy Survey DR10 to conduct a self-supervised search for quasars at $z > 6$, directly analyzing multiband optical images and minimizing the biases of traditional catalog-driven color-color selection criteria. By applying a contrastive learning (CL) method followed by spectral energy distribution (SED) fitting prioritization, we identified 1139 high-priority quasar candidates, for which we expect a competitive $\sim$1:1 quasar-to-UCD ratio based on literature samples. We spectroscopically confirm 16 new quasars at $z = 5.94$-6.45, achieving a 45\% success rate. Remarkably, all 16 objects are relatively bright ($M_{1450} < -25.5$) quasars, including several with unusual properties such as narrow Ly$α$ emission (FWHM $< 2600$ km s$^{-1}$), strong Ly$α$+NV emission with equivalent width $>100$ Å, and mildly red observed-frame near-infrared (NIR) continua ($z - J > 0.4$). Notably, three of them would have been missed by traditional color-color selections. These results highlight the power of self-supervised machine learning combined with SED fitting prioritization to uncover rare distant sources beyond conventional techniques. Our approach offers a scalable and robust framework for data mining and can be readily extended to forthcoming wide-field surveys such as Rubin/LSST, 4MOST, Euclid, and Roman, improving the census of high-redshift quasars and constraints on SMBH formation and evolution in the first billion years of the Universe.

16 new quasars at the end of the reionization unveiled by self-supervised learning

Abstract

Luminous quasars at are key probes of early supermassive black hole (SMBH) growth, massive galaxy evolution, and intergalactic medium properties during cosmic reionization. However, their discovery is very challenging due to their scarcity and overwhelming contamination, as foreground ultracool dwarfs (UCDs) outnumber quasars by 2-4 orders of magnitude. In this work, we leverage the extensive coverage of DESI Legacy Survey DR10 to conduct a self-supervised search for quasars at , directly analyzing multiband optical images and minimizing the biases of traditional catalog-driven color-color selection criteria. By applying a contrastive learning (CL) method followed by spectral energy distribution (SED) fitting prioritization, we identified 1139 high-priority quasar candidates, for which we expect a competitive 1:1 quasar-to-UCD ratio based on literature samples. We spectroscopically confirm 16 new quasars at -6.45, achieving a 45\% success rate. Remarkably, all 16 objects are relatively bright () quasars, including several with unusual properties such as narrow Ly emission (FWHM km s), strong Ly+NV emission with equivalent width Å, and mildly red observed-frame near-infrared (NIR) continua (). Notably, three of them would have been missed by traditional color-color selections. These results highlight the power of self-supervised machine learning combined with SED fitting prioritization to uncover rare distant sources beyond conventional techniques. Our approach offers a scalable and robust framework for data mining and can be readily extended to forthcoming wide-field surveys such as Rubin/LSST, 4MOST, Euclid, and Roman, improving the census of high-redshift quasars and constraints on SMBH formation and evolution in the first billion years of the Universe.
Paper Structure (26 sections, 8 equations, 17 figures, 7 tables)

This paper contains 26 sections, 8 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: z-band $10\arcsec\times10\arcsec$ sized stamps illustrating examples of the artifacts identified in the photometry through visual inspection of the tensor. Light blue circles indicate the 1 and 3 radius apertures used in the compactness criteria calculation. The C$_{1\arcsec/3\arcsec}$ values are presented above the corresponding postage stamp.
  • Figure 2: Architecture of the self-supervised CL framework used in this work. The top and left sections illustrate the image preprocessing and the tensor assembly steps. The structure of the network is shown sequentially from left to right, starting with data augmentation, then the encoder block, and lastly the projection head. Red and green arrows indicate the contrastive loss computation, which measures similarity between random pairs of augmented images.
  • Figure 3: Embedded Euclidean space generated by UMAP for LS DR10 i-dropout sources after training the CL algorithm. All sources are represented as gray countours given by the number density; known M, L and T dwarfs are: brown, black and red crosses, respectively; and Gaia DR3 stars are yellow crosses. Spectroscopically confirmed quasars from the literature and the new quasars from this work are represented by filled circles and stars, respectively, all of them color-coded by their redshift. Quasar candidates are selected from the lower part of the left island where the source density is low and the contamination ratio with UCDs is $\sim$ 1:1. The two regions with the highest concentration of quasars are presented as zoom-in panels and highlighted with black squares.
  • Figure 4: Mean pixel-by-pixel $z$, $r$, and $g$ band fluxes (RGB mapped) within $15 \times 15$ binned embedded space. The number of sources within each bin contributing to the mean is shown in white in the upper left side of each image. The red dashed lines highlight a region with the highest number of known quasars, with a zoom-in binned embedded space on the right side. The number of known quasars in each bin at the zoom-in plot is shown in orange, below the number of sources in white.
  • Figure 5: Example of an SED fitting result and the photometric redshift probability distribution function (top panels). LS DR10, VHS DR7, and WISE postage stamps of the photometry used (bottom panel) for the source LS J000-79. Top left: The blue curve represents the best-fit model given by the QSO1 template (see salvato2022erosita for details) at redshift 6.096, while the red curve shows the best-fit brown dwarf template from meisner2021new. The observed photometry represented by black squares with error bars corresponds to the DECam, VHS, and WISE catalogs and their uncertainties, while the yellow circles are the expected flux densities assuming the best quasar model. Top right: photometric redshift probability distribution function. Despite the distribution extending over $z = [0,12]$, we limited the plot to the range $z = [4,9]$ for better visualization.
  • ...and 12 more figures