Table of Contents
Fetching ...

The impact of spectroscopic incompleteness in direct calibration of redshift distributions for weak lensing surveys

W. G. Hartley, C. Chang, S. Samani, A. Carnero Rosell, T. M. Davis, B. Hoyle, D. Gruen, J. Asorey, J. Gschwend, C. Lidman, K. Kuehn, A. King, M. M. Rau, R. H. Wechsler, J. DeRose, S. R. Hinton, L. Whiteway, T. M. C. Abbott, M. Aguena, S. Allam, J. Annis, S. Avila, G. M. Bernstein, E. Bertin, S. L. Bridle, D. Brooks, D. L. Burke, M. Carrasco Kind, J. Carretero, F. J. Castander, R. Cawthon, M. Costanzi, L. N. da Costa, S. Desai, H. T. Diehl, J. P. Dietrich, B. Flaugher, P. Fosalba, J. Frieman, J. Garcia-Bellido, E. Gaztanaga, D. W. Gerdes, R. A. Gruendl, G. Gutierrez, D. L. Hollowood, K. Honscheid, D. J. James, S. Kent, E. Krause, N. Kuropatkin, O. Lahav, M. Lima, M. A. G. Maia, J. L. Marshall, P. Melchior, F. Menanteau, R. Miquel, R. L. C. Ogando, A. Palmese, F. Paz- Chinchon, A. A. Plazas, A. Roodman, E. S. Rykoff, E. Sanchez, V. Scarpine, M. Schubnell, S. Serrano, I. Sevilla-Noarbe, M. Smith, M. Soares-Santos, E. Suchyta, G. Tarle, M. A. Troxel, D. L. Tucker, T. N. Varga, J. Weller, R. D. Wilkinson

TL;DR

Accurate redshift distributions are essential for weak lensing cosmology. The authors simulate spectroscopic sampling, including observer flags, and apply a random-forest augmentation and colour-magnitude reweighting to assess biases in DES-like data, finding that incompleteness can yield $\Delta z$ biases up to $\sim$0.05 in the highest redshift bin. The work demonstrates that selection effects in spectroscopic targeting and redshift failures cannot be fully corrected by reweighting alone, particularly when high-redshift coverage is limited, and discusses mitigation strategies such as using lower-confidence redshifts, excluding problematic colour regions, and leveraging simulation-informed corrections. These results underscore the need for robust, multi-faceted calibration approaches (e.g., forward modelling and clustering-based methods) for future Stage IV surveys.

Abstract

Obtaining accurate distributions of galaxy redshifts is a critical aspect of weak lensing cosmology experiments. One of the methods used to estimate and validate redshift distributions is apply weights to a spectroscopic sample so that their weighted photometry distribution matches the target sample. In this work we estimate the \textit{selection bias} in redshift that is introduced in this procedure. We do so by simulating the process of assembling a spectroscopic sample (including observer-assigned confidence flags) and highlight the impacts of spectroscopic target selection and redshift failures. We use the first year (Y1) weak lensing analysis in DES as an example data set but the implications generalise to all similar weak lensing surveys. We find that using colour cuts that are not available to the weak lensing galaxies can introduce biases of $Δ~z\sim0.015$ in the weighted mean redshift of different redshift intervals. To assess the impact of incompleteness in spectroscopic samples, we select only objects with high observer-defined confidence flags and compare the weighted mean redshift with the true mean. We find that the mean redshift of the DES Y1 weak lensing sample is typically biased at the $Δ~z=0.005-0.05$ level after the weighting is applied. The bias we uncover can have either sign, depending on the samples and redshift interval considered. For the highest redshift bin, the bias is larger than the uncertainties in the other DES Y1 redshift calibration methods, justifying the decision of not using this method for the redshift estimations. We discuss several methods to mitigate this bias.

The impact of spectroscopic incompleteness in direct calibration of redshift distributions for weak lensing surveys

TL;DR

Accurate redshift distributions are essential for weak lensing cosmology. The authors simulate spectroscopic sampling, including observer flags, and apply a random-forest augmentation and colour-magnitude reweighting to assess biases in DES-like data, finding that incompleteness can yield biases up to 0.05 in the highest redshift bin. The work demonstrates that selection effects in spectroscopic targeting and redshift failures cannot be fully corrected by reweighting alone, particularly when high-redshift coverage is limited, and discusses mitigation strategies such as using lower-confidence redshifts, excluding problematic colour regions, and leveraging simulation-informed corrections. These results underscore the need for robust, multi-faceted calibration approaches (e.g., forward modelling and clustering-based methods) for future Stage IV surveys.

Abstract

Obtaining accurate distributions of galaxy redshifts is a critical aspect of weak lensing cosmology experiments. One of the methods used to estimate and validate redshift distributions is apply weights to a spectroscopic sample so that their weighted photometry distribution matches the target sample. In this work we estimate the \textit{selection bias} in redshift that is introduced in this procedure. We do so by simulating the process of assembling a spectroscopic sample (including observer-assigned confidence flags) and highlight the impacts of spectroscopic target selection and redshift failures. We use the first year (Y1) weak lensing analysis in DES as an example data set but the implications generalise to all similar weak lensing surveys. We find that using colour cuts that are not available to the weak lensing galaxies can introduce biases of in the weighted mean redshift of different redshift intervals. To assess the impact of incompleteness in spectroscopic samples, we select only objects with high observer-defined confidence flags and compare the weighted mean redshift with the true mean. We find that the mean redshift of the DES Y1 weak lensing sample is typically biased at the level after the weighting is applied. The bias we uncover can have either sign, depending on the samples and redshift interval considered. For the highest redshift bin, the bias is larger than the uncertainties in the other DES Y1 redshift calibration methods, justifying the decision of not using this method for the redshift estimations. We discuss several methods to mitigate this bias.

Paper Structure

This paper contains 23 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Example spectra from the simulated VVDS Deep survey. Top panels: Original rest-frame linear combination of k-correct components. Middle panels: Poisson-sampled spectrum including sky emission. Note that the apparent shape of the spectrum is dominated by the sky emission. Bottom panels: Final sky-subtracted and calibrated simulated spectrum (yellow) overlaid with the true spectrum (black). The former is what is passed onto the next stage for redshifting. The $i$-band magnitudes as well as the true redshifts for the two galaxies are listed at the top of the figure. The spectra on the left represents an example of a good spectrum (Flag=4), while the spectra on the right represent a spectrum of relatively poor quality (Flag=2).
  • Figure 2: Random forest (RF) prediction of redshift quality flag against those determined by human observers. The mean predicted flags span a smaller range of values than the true flags, while the overall dispersion is of order 1. The bottom right inset shows a Receiver Operating Characteristic (ROC) curve of how well the RF performs in selecting objects to be retained or cut as the flag threshold value is changed. At the canonical threshold value of 3 the contamination by less secure objects and RF-induced loss of high-confidence objects are both fairly-well contained, at the $\sim5\%$ level. Samples cut with higher flag values are pure, but suffer a greater level of RF-induced incompleteness, resulting in a sample that is smaller than it should be. Conversely, at lower flag numbers the selected sample will be larger and more complete than it should be due to contamination by objects of intrinsically lower confidence. In Sec. \ref{['sec:results']}, this ROC curve translates into a slightly over-estimated bias at the highest flag thresholds, and underestimated bias at lower flag thresholds.
  • Figure 3: Distribution of human-determined redshift quality flags for our simulated datasets (dotted), compared with those from the real survey data (filled grey). We also overlay the calibrated flags in red.
  • Figure 4: Redshift distribution of galaxies matching the VIPERS colour selection: $(r-i)>0.5(u-g)$ or $(r-i)>0.7$ (solid), an $(r-i)<0.7$ sample and a sample selecting just where VIPERS overlaps at $(r-i)<0.7$. These latter two samples have different redshift distributions, and so re-weighting without $(u-g)$ colour information will result in biases. Inset: these three samples in $(u-g)$ vs. $(r-i)$ colour space.
  • Figure 5: Bias in the mean of the redshift distribution for four tomographic bins between a galaxy sample consisting a mix of VIPERS and VVDS wide galaxies and our target sample selected through a simple $17.5<i<22.5$ selection. A weighting scheme is applied to the redshift distribution of the galaxy sample to account for the difference in the colour-magnitude distribution in the VIPERS/VVDS sample and the target sample. From left to right, we vary the relative fraction of VIPERS and VVDS galaxies. The mean redshift is biased high in the two lower redshift bins when a significant fraction of the sample comes from VIPERS.
  • ...and 7 more figures