Table of Contents
Fetching ...

Photometric Redshift Calibration with Self Organising Maps

Angus H. Wright, Hendrik Hildebrandt, Jan Luca van den Busch, Catherine Heymans

TL;DR

This study develops and tests a SOM-based direct photometric redshift calibration method to robustly calibrate cosmic shear tomographic redshift distributions. Using KiDS+VIKING-450 as the primary dataset and MICE2-based simulations, the authors quantify spectroscopic representation, assess biases from photometric noise and selection effects, and demonstrate that a gold-sample with 100% representation per tomographic bin can recover mean redshifts with minimal bias in noiseless conditions. They show that realistic noise and incomplete spectroscopy introduce small biases that can be mitigated with simple quality cuts, achieving Δ⟨z⟩ ≲ 0.025 (97.5% c.l.) in typical bins, and highlight the method's diagnostic advantages over previous direct calibration approaches. The work provides a practical pathway for improving redshift calibration in current and upcoming weak-lensing surveys, while noting the need for more extensive spectroscopic data for future stage-III projects.

Abstract

Accurate photometric redshift calibration is central to the robustness of all cosmology constraints from cosmic shear surveys. Analyses of the KiDS re-weighted training samples from all overlapping spectroscopic surveys to provide a direct redshift calibration. Using self-organising maps (SOMs) we demonstrate that this spectroscopic compilation is sufficiently complete for KiDS, representing $99\%$ of the effective 2D cosmic shear sample. We use the SOM to define a $100\%$ represented `gold' cosmic shear sample, per tomographic bin. Using mock simulations of KiDS and the spectroscopic training set, we estimate the uncertainty on the SOM redshift calibration, and find that photometric noise, sample variance, and spectroscopic selection effects (including redshift and magnitude incompleteness) induce a combined maximal scatter on the bias of the redshift distribution reconstruction ($Δ\langle z \rangle=\langle z \rangle_{\rm est}-\langle z \rangle_{\rm true}$) of $σ_{Δ\langle z \rangle} \leq 0.006$ in all tomographic bins. We show that the SOM calibration is unbiased in the cases of noiseless photometry and perfectly representative spectroscopic datasets, as expected from theory. The inclusion of both photometric noise and spectroscopic selection effects in our mock data introduces a maximal bias of $Δ\langle z \rangle =0.013\pm0.006$, or $Δ\langle z \rangle \leq 0.025$ at $97.5\%$ confidence, once quality flags have been applied to the SOM. The method presented here represents a significant improvement over the previously adopted direct redshift calibration implementation for KiDS, owing to its diagnostic and quality assurance capabilities. The implementation of this method in future cosmic shear studies will allow better diagnosis, examination, and mitigation of systematic biases in photometric redshift calibration.

Photometric Redshift Calibration with Self Organising Maps

TL;DR

This study develops and tests a SOM-based direct photometric redshift calibration method to robustly calibrate cosmic shear tomographic redshift distributions. Using KiDS+VIKING-450 as the primary dataset and MICE2-based simulations, the authors quantify spectroscopic representation, assess biases from photometric noise and selection effects, and demonstrate that a gold-sample with 100% representation per tomographic bin can recover mean redshifts with minimal bias in noiseless conditions. They show that realistic noise and incomplete spectroscopy introduce small biases that can be mitigated with simple quality cuts, achieving Δ⟨z⟩ ≲ 0.025 (97.5% c.l.) in typical bins, and highlight the method's diagnostic advantages over previous direct calibration approaches. The work provides a practical pathway for improving redshift calibration in current and upcoming weak-lensing surveys, while noting the need for more extensive spectroscopic data for future stage-III projects.

Abstract

Accurate photometric redshift calibration is central to the robustness of all cosmology constraints from cosmic shear surveys. Analyses of the KiDS re-weighted training samples from all overlapping spectroscopic surveys to provide a direct redshift calibration. Using self-organising maps (SOMs) we demonstrate that this spectroscopic compilation is sufficiently complete for KiDS, representing of the effective 2D cosmic shear sample. We use the SOM to define a represented `gold' cosmic shear sample, per tomographic bin. Using mock simulations of KiDS and the spectroscopic training set, we estimate the uncertainty on the SOM redshift calibration, and find that photometric noise, sample variance, and spectroscopic selection effects (including redshift and magnitude incompleteness) induce a combined maximal scatter on the bias of the redshift distribution reconstruction () of in all tomographic bins. We show that the SOM calibration is unbiased in the cases of noiseless photometry and perfectly representative spectroscopic datasets, as expected from theory. The inclusion of both photometric noise and spectroscopic selection effects in our mock data introduces a maximal bias of , or at confidence, once quality flags have been applied to the SOM. The method presented here represents a significant improvement over the previously adopted direct redshift calibration implementation for KiDS, owing to its diagnostic and quality assurance capabilities. The implementation of this method in future cosmic shear studies will allow better diagnosis, examination, and mitigation of systematic biases in photometric redshift calibration.

Paper Structure

This paper contains 25 sections, 8 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: The spectroscopic redshift distribution of our combined spectroscopic calibration dataset. The figure shows the redshift distribution as a kernel density estimate (KDE), constructed using a rectangular $\delta z=0.1$ kernel. The KDE is weighted such that lines are interpretable as the instantaneous counts per $\delta z$. The KDE is coloured by the fractional contribution from each of our 5 datasets to the total, which is shown by the black line.
  • Figure 2: The distribution of the 3 primary KV450 spectroscopic samples within the SOM. The figure on the left shows the SOM coloured by the fractional contribution of each of the 3 main spectroscopic samples from KV450. The ternary colour bar is shown on the right. The makeup of individual cells is annotated within the colour-bar as points. SOM cells that are filled entirely by sources from DEEP2, for example, are blue. Conversely cells that are filled by equal mixtures from all 3 samples are grey. Cells which contain spectroscopic data from other surveys (which are not shown) are coloured white. Cells which contain photometric galaxies but no spectroscopy from any survey are coloured black. The figure highlights the complementarity between the DEEP2 and zCOSMOS data, as well as the breadth of coverage of the VVDS data.
  • Figure 3: The change in the value of $n^\prime_{\rm eff}/n_{\rm eff}$ with 100 different lines of sight (testing both noise and sample variance; green), 100 different noise realisations of a single line-of-sight (testing the importance of photometric noise; orange), 100 perfectly sampled spectroscopic catalogues (testing spectroscopic selection effects; purple), and 100 lines-of-sight excluding DEEP2 (testing the similarities to simulations and data; pink). The representations seen in the real KV450 data are also shown (black dashed lines). The distributions show that simulation is a reasonable match to the observed representations, being typically within $\pm5\%$ of the representations seen in the data. We see that photometric noise dominates our observed misrepresentation, and that the MICE2 KV450-like spectroscopic compilation is typically $\sim5\%$ less representative of the full photometric sample, when compared to a perfectly sampled spectral catalogue (with the exception of bin 3). Thus the majority of the under-representation is caused by Poisson sampling and photometric noise.
  • Figure 4: Our new KV450 redshift distribution estimates, for the 'gold' sample; a reduced photometric sample of galaxies with $100\%$ representation in the spectroscopic sample, and which satisfies the quality cuts 'QC1'. The figure shows the reconstructed redshift distributions (green) alongside the purely tomographically binned spectroscopic data (purple). The figure is annotated with the mean redshift estimates for the purely tomographically binned sample ($\langle z \rangle_{\rm raw}$) and the weighted gold sample ($\langle z \rangle_{\rm w,g}$), the difference that our reweighting has had on the spectroscopic means ($\Delta \langle z \rangle$), and the fractional number of galaxies in the gold sample compared to the original KV450 cosmic shear sample $n^\prime_{\rm eff}/n_{\rm eff}$.
  • Figure 5: The change in the effective number density of the MICE2 cosmic shear sample $n_{\rm eff}^\prime / n_{\rm eff}$, caused by the choice of SOM construction parameters and training samples. Each panel shows one tomographic bin from a single realisation of the MICE2 mock KiDS dataset. We separate, in particular, 5 option selections that cause the most significant change to the observed representation in the SOMs: 3 different sets of training inputs, and 3 different spectroscopic dataset constructions. The legend indicates the spectroscopic samples used (KV450-like fiducial setup, without DEEP2, and using a perfect sampling of the photometric data) and/or the input training data (#colours:#magnitudes). The remaining $16$ SOM constructions within each of these subsets are shown as the variously coloured histograms. We can see that the construction of the SOM induces a $\sim$percent level uncertainty on the representation fraction $n^\prime_{\rm eff}/n_{\rm eff}$; the results presented throughout the paper are therefore robust to the construction of our SOM.
  • ...and 5 more figures