Table of Contents
Fetching ...

Deblending the MIGHTEE-COSMOS survey with XID+: The resolved radio source counts to $S_{1.4}\approx 5μ$Jy

Eliab D. Malefahlo, Matt J. Jarvis, Mario G. Santos, Catherine Cress, Daniel J. B. Smith, Catherine Hale, José Afonso, Imogen H. Whittam, Mattia Vaccari, Ian Heywood, Shuowen Jin, Fangxia An

TL;DR

This work tackles source confusion in deep radio surveys by developing a complete deblending framework based on XID+ that uses high-purity, multi-wavelength priors to recover reliable flux densities and source counts in the MIGHTEE-COSMOS field. Through realistic T-RECS-based simulations, the authors show that prior purity and a $50\mu$Jy masking threshold yield the best flux recovery and counts down to $\sim3\sigma$ (≈$3.9\mu$Jy), producing a final catalog of 89,562 sources and a high-fidelity subset of 20,757. Validation against independent analyses and P(D) results demonstrates that the deblended counts closely trace the underlying population, effectively resolving the radio background down to the faintest practical limits, while robustness checks confirm resilience to spatial noise variations and modest astrometric offsets. The methodology offers a practical path forward for confusion-limited surveys and can be extended to other facilities, with Bayesian diagnostics (p-value residuals) guiding fit reliability and potential future refinements such as enhanced priors from full SED modeling.

Abstract

Deep radio continuum surveys provide fundamental constraints on galaxy evolution, but source confusion limits sensitivity to the faintest sources. We present a complete framework for producing high-fidelity deblended radio catalogs from the confused MIGHTEE maps using the probabilistic deblending framework XID+ and prior positions from deep multi-wavelength data in the COSMOS field. To assess performance, we construct MIGHTEE-like simulations based on the Tiered Radio Extragalactic Continuum Simulation (T-RECS) radio source population, ensuring a realistic distribution of star-forming galaxies and active galactic nuclei (AGN) for validation. Through these simulations, we show that prior catalog purity is the dominant factor controlling deblending accuracy: a high-purity prior, containing only sources with a high likelihood of radio detection, recovers accurate flux densities and reproduces input source counts down to $\sim 3σ$ (where $σ= $ thermal noise). On the other hand, a complete prior overestimates the source counts due to spurious detections. Our optimal strategy combines the high-purity prior with a mask that removes sources detected above $50~μ$Jy. Applied to the $\sim$1.3\,deg$^2$ area of the MIGHTEE-COSMOS field defined by overlapping multi-wavelength data, this procedure yields a deblended catalog of 89,562 sources. The derived 1.4\,GHz source counts agree with independent P(D) analyses and indicate that we resolve the radio background to $\sim 4.8\,μ$Jy. We also define a recommended high-fidelity sample of 20,757 sources, based on detection significance, flux density, and goodness-of-fit, which provides reliable flux densities for individual sources in the confusion-limited regime.

Deblending the MIGHTEE-COSMOS survey with XID+: The resolved radio source counts to $S_{1.4}\approx 5μ$Jy

TL;DR

This work tackles source confusion in deep radio surveys by developing a complete deblending framework based on XID+ that uses high-purity, multi-wavelength priors to recover reliable flux densities and source counts in the MIGHTEE-COSMOS field. Through realistic T-RECS-based simulations, the authors show that prior purity and a Jy masking threshold yield the best flux recovery and counts down to (≈Jy), producing a final catalog of 89,562 sources and a high-fidelity subset of 20,757. Validation against independent analyses and P(D) results demonstrates that the deblended counts closely trace the underlying population, effectively resolving the radio background down to the faintest practical limits, while robustness checks confirm resilience to spatial noise variations and modest astrometric offsets. The methodology offers a practical path forward for confusion-limited surveys and can be extended to other facilities, with Bayesian diagnostics (p-value residuals) guiding fit reliability and potential future refinements such as enhanced priors from full SED modeling.

Abstract

Deep radio continuum surveys provide fundamental constraints on galaxy evolution, but source confusion limits sensitivity to the faintest sources. We present a complete framework for producing high-fidelity deblended radio catalogs from the confused MIGHTEE maps using the probabilistic deblending framework XID+ and prior positions from deep multi-wavelength data in the COSMOS field. To assess performance, we construct MIGHTEE-like simulations based on the Tiered Radio Extragalactic Continuum Simulation (T-RECS) radio source population, ensuring a realistic distribution of star-forming galaxies and active galactic nuclei (AGN) for validation. Through these simulations, we show that prior catalog purity is the dominant factor controlling deblending accuracy: a high-purity prior, containing only sources with a high likelihood of radio detection, recovers accurate flux densities and reproduces input source counts down to (where thermal noise). On the other hand, a complete prior overestimates the source counts due to spurious detections. Our optimal strategy combines the high-purity prior with a mask that removes sources detected above Jy. Applied to the 1.3\,deg area of the MIGHTEE-COSMOS field defined by overlapping multi-wavelength data, this procedure yields a deblended catalog of 89,562 sources. The derived 1.4\,GHz source counts agree with independent P(D) analyses and indicate that we resolve the radio background to Jy. We also define a recommended high-fidelity sample of 20,757 sources, based on detection significance, flux density, and goodness-of-fit, which provides reliable flux densities for individual sources in the confusion-limited regime.

Paper Structure

This paper contains 47 sections, 2 equations, 23 figures, 5 tables.

Figures (23)

  • Figure 1: A projection of sources onto the first two dimensions from the Uniform Manifold Approximation and Projection (UMAP) algorithm. These UMAP axes are abstract dimensions derived from the input photometry. Colors represent classifications by HDBSCAN. The main population of galaxies is shown in blue (Cluster 0). The other colors represent distinct populations of stellar sources identified by the clustering algorithm.
  • Figure 2: $J-K$ vs. $g-i$ color-color diagram for the 492,741 sources. The underlying greyscale density map shows the distribution of the full sample. The overlaid contours trace the density of sources classified as galaxies (red) and stars (blue), illustrating the separation achieved by the high-dimensional analysis.
  • Figure 3: The distribution of the galaxy sample in the total infrared luminosity ($L_{\mathrm{IR}}$) versus redshift plane. Sources are classified based on their predicted 1.4 GHz flux density relative to the MIGHTEE sensitivity limit. Blue dots represent sources classified as 'Definitely Detectable' ($S_{\rm pred} > 1\,\mu$Jy), orange dots indicate 'Possibly Detectable' sources (where the upper $L_{\mathrm{IR}}$ uncertainty bound implies $S_{\rm pred} > 1\,\mu$Jy), and grey dots show 'Undetectable' sources.
  • Figure 4: Examples of the complex, multi-component AGN morphologies generated by our simulation pipeline. The top row displays four Fanaroff-Riley Type I (FRI) sources, modeled with bent jets. The bottom row displays four Fanaroff-Riley Type II (FRII) sources, featuring distinct lobes and hotspots. All images are $50 \times 50$ pixels in size, with a pixel scale of 1.1 arcsec.
  • Figure 5: Left: Comparison of input T-RECS catalog flux densities versus flux densities recovered by a multi-stage PyBDSF source-finding process on the simulated map. Sources are colored by type: Compact SFGs (blue), Extended SFGs (cyan), Compact AGNs (green), and Extended AGNs (red). The dashed black line indicates a perfect 1:1 correlation. Right: The log-ratio of recovered PyBDSF-to-true flux versus true flux for all sources (grey points). The solid red line is the running median of this ratio, with the dashed red lines showing the 16th-84th percentile range. The solid blue line indicates a perfect recovery ratio of 1.
  • ...and 18 more figures