Table of Contents
Fetching ...

Clustering-based redshift estimation: method and application to data

Brice Ménard, Ryan Scranton, Samuel Schmidt, Chris Morrison, Donghui Jeong, Tamas Budavari, Mubdi Rahman

TL;DR

The paper introduces a practical, data-driven method to infer the redshift distribution of arbitrary datasets by exploiting spatial cross-correlations with reference populations of known redshifts, using clustering information on all scales and local sampling in photometric space to map photometric observables onto redshift space. It formalizes the relation between sky covariance and redshift distributions, analyzes ideal and non-ideal cases, and then demonstrates the approach on real datasets including LRGs, ELGs, WISE infrared sources, and FIRST radio sources, validating consistency with independent redshift estimates where possible. The study shows the method can recover redshift distributions or at least constrain redshift ranges, particularly when distributions are narrow, and highlights practical strategies and limitations (bias evolution, broad/multi-peaked distributions) with cross-checks using multiple reference samples. Overall, clustering-based redshift estimation offers a scalable way to access the three-dimensional structure of large sky surveys, enabling richer scientific analyses in the absence of complete spectroscopy or perfect photometric redshifts.

Abstract

We present a data-driven method to infer the redshift distribution of an arbitrary dataset based on spatial cross-correlation with a reference population and we apply it to various datasets across the electromagnetic spectrum to show its potential and limitations. Our approach advocates the use of clustering measurements on all available scales, in contrast to previous works focusing only on linear scales. We also show how its accuracy can be enhanced by optimally sampling a dataset within its photometric space rather than applying the estimator globally. We show that the ultimate goal of this technique is to characterize the mapping between the space of photometric observables and redshift space as this characterization then allows us to infer the clustering-redshift p.d.f. of a single galaxy. We apply this technique to estimate the redshift distributions of luminous red galaxies and emission line galaxies from the SDSS, infrared sources from WISE and radio sources from FIRST. We show that consistent redshift distributions are found using both quasars and absorber systems as reference populations. This technique brings valuable information on the third dimension of astronomical datasets. It is widely applicable to a large range of extra-galactic surveys.

Clustering-based redshift estimation: method and application to data

TL;DR

The paper introduces a practical, data-driven method to infer the redshift distribution of arbitrary datasets by exploiting spatial cross-correlations with reference populations of known redshifts, using clustering information on all scales and local sampling in photometric space to map photometric observables onto redshift space. It formalizes the relation between sky covariance and redshift distributions, analyzes ideal and non-ideal cases, and then demonstrates the approach on real datasets including LRGs, ELGs, WISE infrared sources, and FIRST radio sources, validating consistency with independent redshift estimates where possible. The study shows the method can recover redshift distributions or at least constrain redshift ranges, particularly when distributions are narrow, and highlights practical strategies and limitations (bias evolution, broad/multi-peaked distributions) with cross-checks using multiple reference samples. Overall, clustering-based redshift estimation offers a scalable way to access the three-dimensional structure of large sky surveys, enabling richer scientific analyses in the absence of complete spectroscopy or perfect photometric redshifts.

Abstract

We present a data-driven method to infer the redshift distribution of an arbitrary dataset based on spatial cross-correlation with a reference population and we apply it to various datasets across the electromagnetic spectrum to show its potential and limitations. Our approach advocates the use of clustering measurements on all available scales, in contrast to previous works focusing only on linear scales. We also show how its accuracy can be enhanced by optimally sampling a dataset within its photometric space rather than applying the estimator globally. We show that the ultimate goal of this technique is to characterize the mapping between the space of photometric observables and redshift space as this characterization then allows us to infer the clustering-redshift p.d.f. of a single galaxy. We apply this technique to estimate the redshift distributions of luminous red galaxies and emission line galaxies from the SDSS, infrared sources from WISE and radio sources from FIRST. We show that consistent redshift distributions are found using both quasars and absorber systems as reference populations. This technique brings valuable information on the third dimension of astronomical datasets. It is widely applicable to a large range of extra-galactic surveys.

Paper Structure

This paper contains 15 sections, 15 equations, 4 figures.

Figures (4)

  • Figure 1: Offset in the estimation of the mean redshift of a sample due to the lack of knowledge of its clustering amplitude ${\overline b_u}(z)$. The figure shows different scenarios: ${\overline b_u}\propto z^{1/2}$, $z$ and $z^2$, for different fiducial populations with redshift distributions characterized by Gaussians with mean redshift $z_0$ and half width $\sigma_z$. For a broad range of parameters considered this shows that the error induced by assuming a non-evolving ${\overline b_u}$ is small enough to allow a large range of astrophysical studies.
  • Figure 2: Compilation of samples from the SDSS for which we have a robust 3d position, either from spectroscopic or photometric redshifts. In this paper we make use of the spectroscopic samples of quasars and Mg$\;$ absorbers as shown with the dark blue and brown curves.
  • Figure 3: Redshift distributions ${\rm dN/d}z$ (normalized to unity) for Luminous Red Galaxies (LRGs). In both panels the solid red line shows the distribution of LRG photometric redshifts. Left: cluster-z distribution (black points) obtained by measuring the spatial cross-correlation between LRGs and SDSS quasars. Right: cluster-z distribution (black points) obtained by measuring the spatial cross-correlation between LRGs and Mg$\;$ absorbers, spanning the range $0.4<z<2.$
  • Figure 5: Left: Redshift distributions ${\rm dN/d}z$ (normalized to unity) for three subsamples of WISE sources obtained by measuring their spatial cross-correlation with SDSS quasars. We show the selection criteria for red (Sample 1), blue (Sample 2) and green (Sample 3) samples in Eq. \ref{['eq:selWISE']}. Right: Redshift distribution of FIRST radio sources obtained by measuring their spatial cross-correlation with SDSS quasars. We observe the existence of sources up to $z\sim3$ as well as a bimodal redshift distribution.