Table of Contents
Fetching ...

Mapping the Galaxy Color-Redshift Relation: Optimal Photometric Redshift Calibration Strategies for Cosmology Surveys

Daniel Masters, Peter Capak, Daniel Stern, Olivier Ilbert, Mara Salvato, Samuel Schmidt, Giuseppe Longo, Jason Rhodes, Stephane Paltani, Bahram Mobasher, Henk Hoekstra, Hendrik Hildebrandt, Jean Coupon, Charles Steinhardt, Josh Speagle, Andreas Faisst, Adam Kalinich, Mark Brodwin, Massimo Brescia, Stefano Cavuoti

TL;DR

This work presents a data-driven framework using self-organizing maps (SOMs) to map the high-dimensional color space of Euclid-like photometry and assess how well current spectroscopic redshifts cover that space. By projecting COSMOS data into an 8-band, Euclid-like color space, the authors quantify the distribution ρ(⃗C), identify color-space regions lacking secure spectroscopic redshifts, and derive a formal sampling strategy to calibrate the color-redshift relation with minimal spectroscopy. They show that the mean redshift ⟨z⟩ of tomographic bins can be constrained with uncertainty Δ⟨z⟩ ≈ σ⟨z_i⟩/√c, guiding spectroscopic allocation to dense, high-uncertainty cells and projecting a feasible total of ~10–15k spectra for Euclid, depending on strategy. The approach also offers practical insights for identifying degeneracies, refining template priors, and planning calibration campaigns for Euclid, DES, LSST, and WFIRST.

Abstract

Calibrating the photometric redshifts of >10^9 galaxies for upcoming weak lensing cosmology experiments is a major challenge for the astrophysics community. The path to obtaining the required spectroscopic redshifts for training and calibration is daunting, given the anticipated depths of the surveys and the difficulty in obtaining secure redshifts for some faint galaxy populations. Here we present an analysis of the problem based on the self-organizing map, a method of mapping the distribution of data in a high-dimensional space and projecting it onto a lower-dimensional representation. We apply this method to existing photometric data from the COSMOS survey selected to approximate the anticipated Euclid weak lensing sample, enabling us to robustly map the empirical distribution of galaxies in the multidimensional color space defined by the expected Euclid filters. Mapping this multicolor distribution lets us determine where - in galaxy color space - redshifts from current spectroscopic surveys exist and where they are systematically missing. Crucially, the method lets us determine whether a spectroscopic training sample is representative of the full photometric space occupied by the galaxies in a survey. We explore optimal sampling techniques and estimate the additional spectroscopy needed to map out the color-redshift relation, finding that sampling the galaxy distribution in color space in a systematic way can efficiently meet the calibration requirements. While the analysis presented here focuses on the Euclid survey, similar analysis can be applied to other surveys facing the same calibration challenge, such as DES, LSST, and WFIRST.

Mapping the Galaxy Color-Redshift Relation: Optimal Photometric Redshift Calibration Strategies for Cosmology Surveys

TL;DR

This work presents a data-driven framework using self-organizing maps (SOMs) to map the high-dimensional color space of Euclid-like photometry and assess how well current spectroscopic redshifts cover that space. By projecting COSMOS data into an 8-band, Euclid-like color space, the authors quantify the distribution ρ(⃗C), identify color-space regions lacking secure spectroscopic redshifts, and derive a formal sampling strategy to calibrate the color-redshift relation with minimal spectroscopy. They show that the mean redshift ⟨z⟩ of tomographic bins can be constrained with uncertainty Δ⟨z⟩ ≈ σ⟨z_i⟩/√c, guiding spectroscopic allocation to dense, high-uncertainty cells and projecting a feasible total of ~10–15k spectra for Euclid, depending on strategy. The approach also offers practical insights for identifying degeneracies, refining template priors, and planning calibration campaigns for Euclid, DES, LSST, and WFIRST.

Abstract

Calibrating the photometric redshifts of >10^9 galaxies for upcoming weak lensing cosmology experiments is a major challenge for the astrophysics community. The path to obtaining the required spectroscopic redshifts for training and calibration is daunting, given the anticipated depths of the surveys and the difficulty in obtaining secure redshifts for some faint galaxy populations. Here we present an analysis of the problem based on the self-organizing map, a method of mapping the distribution of data in a high-dimensional space and projecting it onto a lower-dimensional representation. We apply this method to existing photometric data from the COSMOS survey selected to approximate the anticipated Euclid weak lensing sample, enabling us to robustly map the empirical distribution of galaxies in the multidimensional color space defined by the expected Euclid filters. Mapping this multicolor distribution lets us determine where - in galaxy color space - redshifts from current spectroscopic surveys exist and where they are systematically missing. Crucially, the method lets us determine whether a spectroscopic training sample is representative of the full photometric space occupied by the galaxies in a survey. We explore optimal sampling techniques and estimate the additional spectroscopy needed to map out the color-redshift relation, finding that sampling the galaxy distribution in color space in a systematic way can efficiently meet the calibration requirements. While the analysis presented here focuses on the Euclid survey, similar analysis can be applied to other surveys facing the same calibration challenge, such as DES, LSST, and WFIRST.

Paper Structure

This paper contains 23 sections, 12 equations, 10 figures.

Figures (10)

  • Figure 1: The 7-color self-organized map (SOM) generated from $\sim$131k galaxies from the COSMOS survey, selected to be representative of the anticipated Euclid weak lensing sample. In the center is the $75\times150$ map itself, which encodes the empirical ugrizYJH spectral energy distributions (SEDs) that appear in the data. The map is colored here by converting the H, i, and u band photometry of the cells to analogous RGB values, while the brightness is scaled to reflect the average brightness of galaxies in different regions of color space. On the sides we show examples of 8-band galaxy SEDs represented by particular cells, whose positions in the map are indicated with arrows. The cell SEDs are shown as black squares. The actual SEDs (shifted to line up in $i$-band magnitude) of galaxies associated with the cells are overlaid as green diamonds. Between 9 and 23 separate galaxy SEDs are plotted for each of the cells shown, but they are similar enough that they are hard to differentiate on this figure. A key feature of the map is that it is topological, in the sense that nearby cells represent objects with similar SEDs, as can be seen from the two example cells shown in the upper left. Note that the axes of the SOM do not correspond to any physical quantity, but merely denote positions of cells within the map and are shown to ease comparison between figures.
  • Figure 2: The variation of two colors along the self-organizing map: $u-g$ on the left and $g-r$ on the right. In the language of machine learning, these are "features" in the data that drive the overall structure of the map. The well-known Lyman break is evident for galaxies at $2.5\lesssim z \lesssim 3$ in $u-g$ and $3 \lesssim z \lesssim 4$ in $g-r$ (around x=50, y=90). The regions with red $g-r$ color spreading diagonally across the lower part of the map are a combination of passive galaxies and dusty galaxies at lower redshift.
  • Figure 3: The SOM colored by the number of galaxies in the overall sample associating with each color cell. The coloration is effectively our estimate of $\rho{(\vec{C})}$, or the density of galaxies as a function of position in color space.
  • Figure 4: Photo-z estimates across the map, computed in two ways. Left: Photo-z's computed directly for each cell by applying the Le Phare template fitting code to the 8-band photometry represented by the cells. Right: Photometric redshifts for the cells computed as the median of the 30-band COSMOS photo-z's of the objects associated with each cell.
  • Figure 5: The dispersion in the photo-z computed with the Le Phare template fitting code as a function of color cell. As can be seen, high dispersion regions predominantly fall in localized areas of color space near the boundary separating high and low redshift galaxies.
  • ...and 5 more figures