Table of Contents
Fetching ...

Redshift inference from the combination of galaxy colors and clustering in a hierarchical Bayesian model $-$ Application to realistic $N$-body simulations

Alex Alarcon, Carles Sánchez, Gary M. Bernstein, Enrique Gaztañaga

TL;DR

This work extends a hierarchical Bayesian framework to infer galaxy redshift distributions by coherently combining priors, photometric data, and clustering information. It implements a realistic density-estimation scheme using KDEs for tracer densities and a parametric biasing function to map to the true density, with a SOM-based phenotype discretization to capture galaxy diversity. Via realistic MICE2 simulations, the authors show that incorporating clustering information tightens redshift posteriors and largely mitigates biases in the prior, achieving mean redshift biases of order 10^-3 even under substantial prior biases, and improving the full n(z) shape by factors up to 3–20 in D_KL. The approach demonstrates robustness and practical potential for controlling redshift systematics in upcoming weak lensing surveys, while highlighting areas for future refinement such as sample-variance treatment and stochastic density-field modeling.

Abstract

Photometric galaxy surveys constitute a powerful cosmological probe but rely on the accurate characterization of their redshift distributions using only broadband imaging, and can be very sensitive to incomplete or biased priors used for redshift calibration. Sánchez & Bernstein (2019) presented a hierarchical Bayesian model which estimates those from the robust combination of prior information, photometry of single galaxies and the information contained in the galaxy clustering against a well-characterized tracer population. In this work, we extend the method so that it can be applied to real data, developing some necessary new extensions to it, especially in the treatment of galaxy clustering information, and we test it on realistic simulations. After marginalizing over the mapping between the clustering estimator and the actual density distribution of the sample galaxies, and using prior information from a small patch of the survey, we find the incorporation of clustering information with photo-$z$'s to tighten the redshift posteriors, and to overcome biases in the prior that mimic those happening in spectroscopic samples. The method presented here uses all the information at hand to reduce prior biases and incompleteness. Even in cases where we artificially bias the spectroscopic sample to induce a shift in mean redshift of $Δ\bar z \approx 0.05,$ the final biases in the posterior are $Δ\bar z \lesssim0.003.$ This robustness to flaws in the redshift prior or training samples would constitute a milestone for the control of redshift systematic uncertainties in future weak lensing analyses.

Redshift inference from the combination of galaxy colors and clustering in a hierarchical Bayesian model $-$ Application to realistic $N$-body simulations

TL;DR

This work extends a hierarchical Bayesian framework to infer galaxy redshift distributions by coherently combining priors, photometric data, and clustering information. It implements a realistic density-estimation scheme using KDEs for tracer densities and a parametric biasing function to map to the true density, with a SOM-based phenotype discretization to capture galaxy diversity. Via realistic MICE2 simulations, the authors show that incorporating clustering information tightens redshift posteriors and largely mitigates biases in the prior, achieving mean redshift biases of order 10^-3 even under substantial prior biases, and improving the full n(z) shape by factors up to 3–20 in D_KL. The approach demonstrates robustness and practical potential for controlling redshift systematics in upcoming weak lensing surveys, while highlighting areas for future refinement such as sample-variance treatment and stochastic density-field modeling.

Abstract

Photometric galaxy surveys constitute a powerful cosmological probe but rely on the accurate characterization of their redshift distributions using only broadband imaging, and can be very sensitive to incomplete or biased priors used for redshift calibration. Sánchez & Bernstein (2019) presented a hierarchical Bayesian model which estimates those from the robust combination of prior information, photometry of single galaxies and the information contained in the galaxy clustering against a well-characterized tracer population. In this work, we extend the method so that it can be applied to real data, developing some necessary new extensions to it, especially in the treatment of galaxy clustering information, and we test it on realistic simulations. After marginalizing over the mapping between the clustering estimator and the actual density distribution of the sample galaxies, and using prior information from a small patch of the survey, we find the incorporation of clustering information with photo-'s to tighten the redshift posteriors, and to overcome biases in the prior that mimic those happening in spectroscopic samples. The method presented here uses all the information at hand to reduce prior biases and incompleteness. Even in cases where we artificially bias the spectroscopic sample to induce a shift in mean redshift of the final biases in the posterior are This robustness to flaws in the redshift prior or training samples would constitute a milestone for the control of redshift systematic uncertainties in future weak lensing analyses.

Paper Structure

This paper contains 25 sections, 21 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: (Upper panel:) Redshift distributions of the target and tracer samples. The target sample contains the galaxies for which we want to find a redshift distribution. The tracer sample contains galaxies with known redshifts that are used to add the clustering information into the redshift estimation. (Lower panel:) Redshift distribution of tomographic bins defined as in §\ref{['sec:tomo']}.
  • Figure 2: Mean redshift and redshift dispersion of cells in deep and wide SOMs described in §\ref{['sec:soms']}. The left and central columns show the SOM maps populated with these quantities, while the plots in the right column show the comparison of these distributions. These show how the deep SOM better samples the redshift space of the simulation test, with a lower redshift scatter per cell.
  • Figure 3: Density field estimation using different kernel density estimators from a tracer sample population. Shows the field estimate for a small patch in the highest redshift bin. The black dots show the position of the tracer galaxies, and the background colors show the estimated value of the density field at different positions. The top panels show a flat kernel with a large size ($r_{\mathrm{max}}=30\mathrm{Mpc}$, left) and a small size ($r_{\mathrm{max}}=3\mathrm{Mpc}$, right). The bottom left panel shows the density with a power-law kernel that better resolves the structures. The bottom right panel shows a field estimated with an optimized kernel, which is our default density field estimate. Note the change in color scales in different panels, with white always corresponding to the mean density.
  • Figure 4: Comparison between a power law KDE and a KDE with a power law that truncates at some scale $r^{*}$. Such truncation reduces the impact of shot noise in smaller scales and naturally adds a small exclusion region around the positions of tracers.
  • Figure 5: (Upper panel): Ratio between the abundance of target galaxies and random points as a function of estimated KDE density, for a power law KDE $r\propto r^{-0.8}$ and $r_{\mathrm{max}}=10\mathrm{Mpc}$. The different redshift bins are color coded. If the KDE delivered a perfectly unbiased field estimate of the target galaxies, we would expect to find the dashed line relation. All galaxies have been used without tomographic bin selection to obtain a better estimate. The true redshift of all the target galaxies was used, while in a real data scenario one could only estimate this relation in the smaller calibration fields. (Lower panel): Same as upper panel, but using an optimized KDE with $r_{\mathrm{max}}=15\mathrm{Mpc}$. The KDE is optimized from a function that combines a power law and an exponential truncation at small scales to deal with shot noise effects (see Fig. \ref{['kde_optimize_formula']}). The optimal parameters are found from a calibration field from $\sim 3.5 \mathrm{deg}^{2}$ where redshifts for the target galaxies are known. It shows a more linear relation, although remains substantially nonlinear at the extremes of density.
  • ...and 7 more figures