Table of Contents
Fetching ...

Making the most of pure parallels: Machine learning augmented photometric redshifts for sparse JWST filter sets

Kenneth J. Duncan

TL;DR

The paper tackles the challenge of photometric redshift estimation for JWST surveys with sparse filter coverage by comparing traditional template fitting (EAzY) to ML-based methods (GPz and NNpz) and exploring hybrid consensus approaches. It demonstrates that NNpz provides the strongest single-method performance up to $z\sim8$, while GPz reduces catastrophic failures when combined with templates. Hierarchical Bayesian fusion of ML and template posteriors yields robust, low-scatter photo-$z$ with reduced outliers ($\sigma_{\text{NMAD}} \approx 0.033$, $\text{OLF}_{0.15} \approx 0.063$ for $m_{\text{F444W}}<27.5$), improving reliability across redshifts. The results, applicable to PANORAMIC and BEACON pure-parallel surveys, underscore the value of ML and hybrid approaches for maximizing JWST data return, with code and notebooks made publicly available for reproducibility.

Abstract

Photometric redshifts (photo-$z$s) are an essential tool for galaxy evolution science with JWST. However, for deep surveys with more limited filter sets (i.e. $N_{\text{filt}} \sim6$) such as large pure parallel surveys, the most commonly used template-fitting based photo-$z$ approaches can yield highly confident but spurious results for high-$z$ populations of interest. The utility and legacy value of these datasets could therefore be negatively impacted. To address this challenge, we present an application of machine learning (ML) based photo-$z$ techniques to deep JWST photometric datasets. We employ two different ML algorithms, using Gaussian processes and nearest-neighbour estimates, alongside a more standard template fitting approach. We show that simple nearest-neighbour based estimates can provide more accurate photo-$z$s than template fitting out to $z\sim8$, as well as reducing the fraction of catastrophic outliers by a factor of $\sim2-3$. Additionally, `hybrid' estimates combining template and ML can yield further improvements in overall accuracy and reliability while retaining some ability to predict photo-$z$ out to $z > 10$. The nearest-neighbour only or hybrid estimates can achieve photo-$z$s with robust scatter of $σ_{\text{NMAD}}\sim0.03-0.04$ and outlier fractions of $\sim3-10\%$ between $0 < z \lesssim 8$ from just 6 NIRCam bands, with negligible additional computational costs compared to standard template fitting. Our methodology is easily adaptable to alternative datasets, filter combinations or training samples. Overall, our results highlight the potential for even simple ML techniques to enhance the scientific return of JWST pure parallel and wide-area surveys.

Making the most of pure parallels: Machine learning augmented photometric redshifts for sparse JWST filter sets

TL;DR

The paper tackles the challenge of photometric redshift estimation for JWST surveys with sparse filter coverage by comparing traditional template fitting (EAzY) to ML-based methods (GPz and NNpz) and exploring hybrid consensus approaches. It demonstrates that NNpz provides the strongest single-method performance up to , while GPz reduces catastrophic failures when combined with templates. Hierarchical Bayesian fusion of ML and template posteriors yields robust, low-scatter photo- with reduced outliers (, for ), improving reliability across redshifts. The results, applicable to PANORAMIC and BEACON pure-parallel surveys, underscore the value of ML and hybrid approaches for maximizing JWST data return, with code and notebooks made publicly available for reproducibility.

Abstract

Photometric redshifts (photo-s) are an essential tool for galaxy evolution science with JWST. However, for deep surveys with more limited filter sets (i.e. ) such as large pure parallel surveys, the most commonly used template-fitting based photo- approaches can yield highly confident but spurious results for high- populations of interest. The utility and legacy value of these datasets could therefore be negatively impacted. To address this challenge, we present an application of machine learning (ML) based photo- techniques to deep JWST photometric datasets. We employ two different ML algorithms, using Gaussian processes and nearest-neighbour estimates, alongside a more standard template fitting approach. We show that simple nearest-neighbour based estimates can provide more accurate photo-s than template fitting out to , as well as reducing the fraction of catastrophic outliers by a factor of . Additionally, `hybrid' estimates combining template and ML can yield further improvements in overall accuracy and reliability while retaining some ability to predict photo- out to . The nearest-neighbour only or hybrid estimates can achieve photo-s with robust scatter of and outlier fractions of between from just 6 NIRCam bands, with negligible additional computational costs compared to standard template fitting. Our methodology is easily adaptable to alternative datasets, filter combinations or training samples. Overall, our results highlight the potential for even simple ML techniques to enhance the scientific return of JWST pure parallel and wide-area surveys.

Paper Structure

This paper contains 17 sections, 8 equations, 8 figures.

Figures (8)

  • Figure 1: Redshift and F277W magnitude ($m_{\text{F277W}}$) distribution for the spectroscopic training sample. The marginal redshift and magnitude distributions for the subsets and the combined sample (black) are shown in the histograms (above and right respectively).
  • Figure 2: Photo-$z$ scatter ($\sigma_{\text{NMAD}}$) and outlier fraction ($\text{OLF}_{0.15}$) from GPz as a function of basis functions, $N_{\text{BF}}$, used for training. Datapoints and corresponding uncertainties present the median and 16 to 84th percentiles of the $\sigma_{\text{NMAD}}$/$\text{OLF}_{0.15}$ calculated for 100 bootstrap resamples of the test sample. Above $\sim70$ basis functions, the resulting model complexity yields no significant gain in photo-$z$ precision and reliability.
  • Figure 3: Cumulative distribution of threshold credible intervals, $c$, ($\hat{F}(c)$) for the spectroscopic test sample before (dashed lines) and after (solid lines) uncertainty calibration for each of the individual photo-$z$ methodologies. Lines that rise above the 1:1 relation illustrate under-confidence in the photo-$z$ uncertainties (uncertainties overestimated) while lines that fall under illustrate over-confidence (uncertainties underestimated).
  • Figure 4: Panels illustrate the best photo-$z$ point-source estimate and corresponding uncertainties (see text for definitions) of each individual methodology for the same spectroscopic test sample. Sources selected as 'good', with well constrained primary photo-$z$ peaks, are shown as filled symbols and corresponding error bars with the percentage of test sources selected by this criteria annotated in the upper right corner of each panel. For sources that don't meet these criteria, we plot only the corresponding uncertainty range. For each method, we also show the photo-$z$ scatter and outlier statistics achieved for the 'good' photo-$z$ (and full) samples brighter than $m_{\text{F444W}} < 27.5$.
  • Figure 5: Cumulative distribution of threshold credible intervals, $c$, ($\hat{F}(c)$) for the spectroscopic test sample for eachof the three consensus photo-$z$ estimates derived from Hierarchical Bayesian combination of the template and ML methodologies. Lines that rise above the 1:1 relation illustrate under-confidence in the photo-$z$ uncertainties (uncertainties overestimated) while lines that fall under illustrate over-confidence (uncertainties underestimated).
  • ...and 3 more figures