Table of Contents
Fetching ...

The-wiZZ: Clustering redshift estimation for everyone

Christopher B. Morrison, Hendrik Hildebrandt, Samuel J. Schmidt, Ivan K. Baldry, Maciej Bilicki, Ami Choi, Thomas Erben, Peter Schneider

TL;DR

Clustering redshift estimation provides redshift distributions for galaxies without spectroscopy. The-wiZZ introduces a fast, end-user workflow that precomputes reference-unknown pairs and lets users generate clustering redshift PDFs for any subsample without re-running correlations. It demonstrates consistency with KiDS-450 cosmic shear redshift distributions and extends to single-galaxy clustering redshifts via a kdTree approach, with simple bias-mitigation options. The method is open-source and scalable to future surveys like LSST, Euclid, and WFIRST, providing a practical path to accurate redshift distributions for large photometric datasets.

Abstract

We present The-wiZZ, an open source and user-friendly software for estimating the redshift distributions of photometric galaxies with unknown redshifts by spatially cross-correlating them against a reference sample with known redshifts. The main benefit of The-wiZZ is in separating the angular pair finding and correlation estimation from the computation of the output clustering redshifts allowing anyone to create a clustering redshift for their sample without the intervention of an "expert". It allows the end user of a given survey to select any sub-sample of photometric galaxies with unknown redshifts, match this sample's catalog indices into a value-added data file, and produce a clustering redshift estimation for this sample in a fraction of the time it would take to run all the angular correlations needed to produce a clustering redshift. We show results with this software using photometric data from the Kilo-Degree Survey (KiDS) and spectroscopic redshifts from the Galaxy and Mass Assembly (GAMA) survey and the Sloan Digital Sky Survey (SDSS). The results we present for KiDS are consistent with the redshift distributions used in a recent cosmic shear analysis from the survey. We also present results using a hybrid machine learning-clustering redshift analysis that enables the estimation of clustering redshifts for individual galaxies. The-wiZZ can be downloaded at http://github.com/morriscb/The-wiZZ/.

The-wiZZ: Clustering redshift estimation for everyone

TL;DR

Clustering redshift estimation provides redshift distributions for galaxies without spectroscopy. The-wiZZ introduces a fast, end-user workflow that precomputes reference-unknown pairs and lets users generate clustering redshift PDFs for any subsample without re-running correlations. It demonstrates consistency with KiDS-450 cosmic shear redshift distributions and extends to single-galaxy clustering redshifts via a kdTree approach, with simple bias-mitigation options. The method is open-source and scalable to future surveys like LSST, Euclid, and WFIRST, providing a practical path to accurate redshift distributions for large photometric datasets.

Abstract

We present The-wiZZ, an open source and user-friendly software for estimating the redshift distributions of photometric galaxies with unknown redshifts by spatially cross-correlating them against a reference sample with known redshifts. The main benefit of The-wiZZ is in separating the angular pair finding and correlation estimation from the computation of the output clustering redshifts allowing anyone to create a clustering redshift for their sample without the intervention of an "expert". It allows the end user of a given survey to select any sub-sample of photometric galaxies with unknown redshifts, match this sample's catalog indices into a value-added data file, and produce a clustering redshift estimation for this sample in a fraction of the time it would take to run all the angular correlations needed to produce a clustering redshift. We show results with this software using photometric data from the Kilo-Degree Survey (KiDS) and spectroscopic redshifts from the Galaxy and Mass Assembly (GAMA) survey and the Sloan Digital Sky Survey (SDSS). The results we present for KiDS are consistent with the redshift distributions used in a recent cosmic shear analysis from the survey. We also present results using a hybrid machine learning-clustering redshift analysis that enables the estimation of clustering redshifts for individual galaxies. The-wiZZ can be downloaded at http://github.com/morriscb/The-wiZZ/.

Paper Structure

This paper contains 20 sections, 1 equation, 8 figures, 1 table, 2 algorithms.

Figures (8)

  • Figure 1: Flow cart of the inputs and output of The-wiZZ. In the upper left we have the work done by an individual survey in spatially masking the catalog, running pair_maker, and creating The-wiZZ's output HDF5 data file. The upper right shows a user selecting a sample from the masked catalog for their own work. The lower portion is the end user matching their specific sample into the data file using pdf_maker and producing a resultant clustering redshift estimate for their sample without having to run any cross-correlations.
  • Figure 2: Symmetric-log plot of the number of spectra overlapping with the current KiDS coverage as a function of redshift. The GAMA survey is the dominant sample for low redshifts with SDSS dominating for high redshift. The large amount of objects above $z=1.0$ are spectroscopic QSOs. The data are binned linearly in $\ln(1+z)$ and follow the exact binning that we will later use in our clustering redshifts. Above the dotted line the data are plotted logarithmically in $N(z)$, below they are plotted linearly. For this plot we show galaxies in the GAMA catalog that have spectral redshifts from SDSS as SDSS galaxies.
  • Figure 3: Raw and summed clustering-$z$s produced by The-wiZZ using objects from KiDS selected in $z_{\rm B}$ as the unknown sample and GAMA and SDSS spectra as the reference sample normalized into an estimated PDF. Colored bands are clustering-$z$s from selections in $z_{\rm B}$ mimicking the bins of hildebrandt16 (CS bins). The light grey regions show the selection in photo-$z$. Grey dashed lines are the cluster-$z$s produced by dividing the CS bins into 4 sub-bins with $\Delta z_{\rm B}=0.05$ normalized by their number of objects relative to the CS bin. The grey dashed lines appear to sum up to the CS bin as a function of redshift suggesting the galaxy bias in the clustering redshift estimate is well behaved. Black data points are the resultant clustering-$z$ from normalizing, summing, and averaging the individual spatial bootstraps of the sub-bins into the full CS bin. The bins were all selected from the same catalog and use the same The-wiZZ data file demonstrating how clustering-$z$s can be quickly created for a variety of samples using The-wiZZ.
  • Figure 4: Raw and summed clustering-$z$s for the total, spanning bin of $0.1<z_{\rm B}\leq0.9$. The orange colored band shows the clustering-$z$ result from running The-wiZZ on the full $0.1<z_{\rm B} \leq0.9$ sample. The black data points are the clustering-$z$ created from summing the spatial bootstraps of the clustering-$z$s from the 16, $\Delta z_{\rm B}=0.05$ sub-bins into the total bin. The light-grey region shows the $z_{\rm B}$ selection. We sum the clustering-$z$s together in this manner to mitigate galaxy bias in the clustering-$z$s as suggested in menard13 and schmidt13. These works show how clustering-$z$s of wide distributions in $z$ are much more susceptible to the effect of galaxy bias than narrow selections. The difference between the raw and summed clustering-$z$s shows the effect of this bias mitigation. The design of The-wiZZ is well suited for bias mitigation strategies such as this.
  • Figure 5: Symmetric-log plot of the histograms of the median distance in normalized units of the 4 096 objects matched to the input object from the kdTree. The different curves are for each of the photo-$z$ selected galaxy samples we use to test the method. That most of the median distances are below a value of $1\sigma$ away in color-magnitude space lends credence that the self-similar galaxies selected from the kdTree are representative of the input galaxy. Above the dotted line the data are plotted logarithmically, below they are plotted linearly.
  • ...and 3 more figures