Table of Contents
Fetching ...

Using different sources of ground truths and transfer learning to improve the generalization of photometric redshift estimation

Jonathan Soriano, Srinath Saikrishnan, Vikram Seenivasan, Bernie Boscoe, Jack Singal, Tuan Do

TL;DR

This work uses the COSMOS2020 survey to create a dataset, TransferZ, which includes photometric redshift estimates derived from up to 35 imaging filters using template fitting and investigates transfer learning and directly combining ground truth redshifts derived from photometry and spectroscopy.

Abstract

In this work, we explore methods to improve galaxy redshift predictions by combining different ground truths. Traditional machine learning models rely on training sets with known spectroscopic redshifts, which are precise but only represent a limited sample of galaxies. To make redshift models more generalizable to the broader galaxy population, we investigate transfer learning and directly combining ground truth redshifts derived from photometry and spectroscopy. We use the COSMOS2020 survey to create a dataset, TransferZ, which includes photometric redshift estimates derived from up to 35 imaging filters using template fitting. This dataset spans a wider range of galaxy types and colors compared to spectroscopic samples, though its redshift estimates are less accurate. We first train a base neural network on TransferZ and then refine it using transfer learning on a dataset of galaxies with more precise spectroscopic redshifts (GalaxiesML). In addition, we train a neural network on a combined dataset of TransferZ and GalaxiesML. Both methods reduce bias by $\sim$ 5x, RMS error by $\sim$ 1.5x, and catastrophic outlier rates by 1.3x on GalaxiesML, compared to a baseline trained only on TransferZ. However, we also find a reduction in performance for RMS and bias when evaluated on TransferZ data. Overall, our results demonstrate these approaches can meet cosmological requirements.

Using different sources of ground truths and transfer learning to improve the generalization of photometric redshift estimation

TL;DR

This work uses the COSMOS2020 survey to create a dataset, TransferZ, which includes photometric redshift estimates derived from up to 35 imaging filters using template fitting and investigates transfer learning and directly combining ground truth redshifts derived from photometry and spectroscopy.

Abstract

In this work, we explore methods to improve galaxy redshift predictions by combining different ground truths. Traditional machine learning models rely on training sets with known spectroscopic redshifts, which are precise but only represent a limited sample of galaxies. To make redshift models more generalizable to the broader galaxy population, we investigate transfer learning and directly combining ground truth redshifts derived from photometry and spectroscopy. We use the COSMOS2020 survey to create a dataset, TransferZ, which includes photometric redshift estimates derived from up to 35 imaging filters using template fitting. This dataset spans a wider range of galaxy types and colors compared to spectroscopic samples, though its redshift estimates are less accurate. We first train a base neural network on TransferZ and then refine it using transfer learning on a dataset of galaxies with more precise spectroscopic redshifts (GalaxiesML). In addition, we train a neural network on a combined dataset of TransferZ and GalaxiesML. Both methods reduce bias by 5x, RMS error by 1.5x, and catastrophic outlier rates by 1.3x on GalaxiesML, compared to a baseline trained only on TransferZ. However, we also find a reduction in performance for RMS and bias when evaluated on TransferZ data. Overall, our results demonstrate these approaches can meet cosmological requirements.

Paper Structure

This paper contains 10 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Two datasets: GalaxiesML do2024 with spectroscopic redshift ground truth and TransferZ with COSMOS2020 survey weaver2022 multi-band imaging redshift ground truth. The distribution of the dataset in redshift (left), i-band magnitude (center), and color (right) shows how the datasets complement each other to help the model generalize beyond the range of brightness and color sampled by the spectroscopic surveys.
  • Figure 2: Comparison of redshift predictions from the three neural network models in this work (NN-Base, NN-TL, and NN-Combo) against true redshift values. We show results for the GalaxiesML test dataset using spectroscopic redshift as ground truth. Results shown are from one randomly selected run out of 100 total iterations.
  • Figure 3: From left to right, comparison of the bias, outlier and RMS metrics between the baseline NN, transfer-learnt NN, combo NN, and jones2024. The metrics are evaluated on the target (blue) and source (orange) data within the range of $0.3\leq z \leq 1.5$. The error bars are generated from 100 random initializations of the model training. We report jones2024 scatter value for RMS. While jones2024 use a different RMS definition, our RMS calculation is equivalent to their reported scatter value.
  • Figure 4: Flow chart showing the steps used in creating the TransferZ dataset. Green rectangles represent processes and blue rectangles represent inputs. The red rectangle is the dataset released with this paper.