Table of Contents
Fetching ...

Counterpart identification and classification for eRASS1 and characterisation of the AGN content

M. Salvato, J. Wolf, T. Dwelly, H. Starck, J. Buchner, R. Shirley, A. Merloni, A. Georgakakis, F. Balzer, M. Brusa, A. Rau, S. Freund, D. Lang, T. Liu, G. Lamer, A. Schwope, W. Roster, S. Waddell, M. Scialpi, Z. Igo, M. Kluge, F. Mannucci, S. Tiwari, D. Homan, M. Krumpe, A. Zenteno, D. Hernandez-Lang, J. Comparat, M. Fabricius, J. Snigula, D. Schlegel, B. A. Weaver, R. Zhou, A. Dey, F. Valdes, A. Myers, S. Juneau, H. Winkler, I. Marquez, F. di Mille, S. Ciroi, M. Schramm, D. A. H. Buckley, J. Brink, M. Gromadzki, J. Robrade, K. Nandra

TL;DR

This work tackles building a large, clean AGN sample from eRASS1 by delivering robust optical/IR counterpart identification with NWAY across LS10, Gaia DR3, and CW2020, using trained priors to discriminate Galactic from extragalactic sources. It pairs this with redshift estimation via Circlez and a detailed multi-method Galactic/extragalactic classification, producing three high-quality counterpart catalogues and training/validation sets as benchmarks. The study demonstrates that LS10 provides the most reliable associations, but also shows valuable complementary matches from Gaia DR3 and CW2020, enabling the construction of multiple well-defined AGN samples with tunable completeness and purity. The released catalogs, redshift information, and accompanying tools empower researchers to assemble tailored AGN samples and advance statistical studies of AGN demographics in the eROSITA era. Overall, the paper delivers a practical, scalable framework for counterpart identification, source classification, and redshift estimation in a large all-sky X-ray survey context.

Abstract

[abridged] Accurately accounting for the AGN phase in galaxy evolution requires a large, clean AGN sample. This is now possible with SRG/eROSITA. The public Data Release 1 (DR1, Jan 31, 2024) includes 930,203 sources from the Western Galactic Hemisphere. The data enable the selection of a large AGN sample and the discovery of rare sources. However, scientific return depends on accurate characterisation of the X-ray emitters, requiring high-quality multiwavelength data. This paper presents the identification and classification of optical and infrared counterparts to eRASS1 sources using Gaia DR3, CatWISE2020, and Legacy Survey DR10 (LS10) with the Bayesian NWAY algorithm and trained priors. Sources were classified as Galactic or extragalactic via a Machine Learning model combining optical/IR and X-ray properties, trained on a reference sample. For extragalactic LS10 sources, photometric redshifts were computed using Circlez. Within the LS10 footprint, all 656,614 eROSITA/DR1 sources have at least one possible optical counterpart; about 570,000 are extragalactic and likely AGN. Half are new detections compared to AllWISE, Gaia, and Quaia AGN catalogues. Gaia and CatWISE2020 counterparts are less reliable, due to the surveys shallowness and the limited amount of features available to assess the probability of being an X-ray emitter. In the Galactic Plane, where the overdensity of stellar sources also increases the chance of associations, using conservative reliability cuts, we identify approximately 18,000 Gaia and 55,000 CatWISE2020 extragalactic sources. We release three high-quality counterpart catalogues, plus the training and validation sets, as a benchmark for the field. These datasets have many applications, but in particular empower researchers to build AGN samples tailored for completeness and purity, accelerating the hunt for the Universes most energetic engines.

Counterpart identification and classification for eRASS1 and characterisation of the AGN content

TL;DR

This work tackles building a large, clean AGN sample from eRASS1 by delivering robust optical/IR counterpart identification with NWAY across LS10, Gaia DR3, and CW2020, using trained priors to discriminate Galactic from extragalactic sources. It pairs this with redshift estimation via Circlez and a detailed multi-method Galactic/extragalactic classification, producing three high-quality counterpart catalogues and training/validation sets as benchmarks. The study demonstrates that LS10 provides the most reliable associations, but also shows valuable complementary matches from Gaia DR3 and CW2020, enabling the construction of multiple well-defined AGN samples with tunable completeness and purity. The released catalogs, redshift information, and accompanying tools empower researchers to assemble tailored AGN samples and advance statistical studies of AGN demographics in the eROSITA era. Overall, the paper delivers a practical, scalable framework for counterpart identification, source classification, and redshift estimation in a large all-sky X-ray survey context.

Abstract

[abridged] Accurately accounting for the AGN phase in galaxy evolution requires a large, clean AGN sample. This is now possible with SRG/eROSITA. The public Data Release 1 (DR1, Jan 31, 2024) includes 930,203 sources from the Western Galactic Hemisphere. The data enable the selection of a large AGN sample and the discovery of rare sources. However, scientific return depends on accurate characterisation of the X-ray emitters, requiring high-quality multiwavelength data. This paper presents the identification and classification of optical and infrared counterparts to eRASS1 sources using Gaia DR3, CatWISE2020, and Legacy Survey DR10 (LS10) with the Bayesian NWAY algorithm and trained priors. Sources were classified as Galactic or extragalactic via a Machine Learning model combining optical/IR and X-ray properties, trained on a reference sample. For extragalactic LS10 sources, photometric redshifts were computed using Circlez. Within the LS10 footprint, all 656,614 eROSITA/DR1 sources have at least one possible optical counterpart; about 570,000 are extragalactic and likely AGN. Half are new detections compared to AllWISE, Gaia, and Quaia AGN catalogues. Gaia and CatWISE2020 counterparts are less reliable, due to the surveys shallowness and the limited amount of features available to assess the probability of being an X-ray emitter. In the Galactic Plane, where the overdensity of stellar sources also increases the chance of associations, using conservative reliability cuts, we identify approximately 18,000 Gaia and 55,000 CatWISE2020 extragalactic sources. We release three high-quality counterpart catalogues, plus the training and validation sets, as a benchmark for the field. These datasets have many applications, but in particular empower researchers to build AGN samples tailored for completeness and purity, accelerating the hunt for the Universes most energetic engines.

Paper Structure

This paper contains 42 sections, 6 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: Number of X-ray sources detected in eRASS1 per contiguous eROSITA tile ($3\deg \times\, 3\deg$), over the entire eROSITA_DE region in zenithal equal area (ZEA) projection. The highest density is at the South Ecliptic Pole (SEP). The map is shown in Galactic coordinates.
  • Figure 2: Source density per eROSITA sky tile ($3\deg \times\, 3\deg$) of LS DR10 (LS10; left panel), Gaia DR3 (GDR3; middle panel) and CatWISE2020 (CW2020; right panel). On the LS10 map, the dark green (magenta) regions of InAllLS10 (InAnyLS10), indicating whether all (at least one of) the LS10 bands reach the nominal depth of the survey (see Section \ref{['sec:limitations']}), are overplotted.
  • Figure 3: Confusion matrices resulting from the Random Forest classification on the validation test sets are shown for LS10 (left panel), GDR3 (middle panel), and CW2020 (right panel). The colour scale in the panels represents the number of sources per cell, with darker shades indicating higher counts. The entries along the main diagonal (top-left to bottom-right) indicate correctly classified sources. The different total number of sources across the three matrices reflects the variations in survey depth, spatial resolution, and source density among the three surveys.
  • Figure 4: Histogram distribution of the probability weighting (i.e., bias), introduced by the priors for the LS10, GDR3 and CW2020 counterparts to eRASS1. The left panel shows the distribution of the bias for the actual eRASS1 counterparts, while the right panel displays the distribution for counterparts to the eRASS1 shifted positions. A value of 1 indicates no change in the probability of being the right counterpart i.e., the probability is based solely on the distance to the X-ray position, positional uncertainties, and source number densities. Values above(below) 1 indicates that the prior has degraded(reinforced) the probability. Almost 50% of the counterparts from GDR3 got degraded, after considering the prior. More details in the main text (Section \ref{['subsub:astrometry_prior']}).
  • Figure 5: Mean p_any distribution per eROSITA tile for real (top) and random (bottom row) eROSITA coordinates, using ancillary data from LS10 (left), GDR3 (middle) and CW2020 (right). The colour scale is consistent across all panels to facilitate comparison. While the p_any values for the real sources are generally higher than those for the random positions, they can approach similar levels in regions of high source density, indicating an increased risk of chance associations (see the main text for further discussions).
  • ...and 13 more figures