Table of Contents
Fetching ...

SHAPE. I. A SOM-SED hybrid approach for efficient galaxy parameter estimation leveraging JWST

Zihao Wang, Tao Wang, Ke Xu, Hanwen Sun, Ruining Tian, Qi Hao

TL;DR

SHAPE introduces a SOM-SED Hybrid Approach that leverages JWST PRIMER data to calibrate galaxy parameter estimation for upcoming wide-field surveys. By clustering galaxies with a SOM and constructing an SED Lib from cell-average templates, SHAPE extends parameter inference to diverse filter sets (e.g., COSMOS2020, CSST, Euclid) with a continuous, probabilistic mapping that preserves efficiency. It achieves near-SED-fitting accuracy for stellar mass and star formation rate (NMAD < 0.2 dex) while dramatically reducing computation time, and demonstrates improved SFR estimates under limited-band photometry. The approach promises scalable, cross-survey parameter estimation for next-generation surveys, while acknowledging limitations from training size, missing data, and redshift inference, and outlining paths for SHAPE II enhancements.

Abstract

With the launch and application of next-generation ground- and space-based telescopes, astronomy has entered the era of big data, necessitating more efficient and robust data analysis methods. Most traditional parameter estimation methods are unable to reconcile differences between photometric systems. Ideally, we would like to optimally rely on high-quality observation data provided by, e.g., JWST, for calibrating and improving upcoming wide-field surveys such as the China Space Station Telescope (CSST) and Euclid. To this end, we introduce a new approach (SHAPE, SOM-SED Hybrid Approach for efficient Parameter Estimation) that can bridge different photometric systems and efficiently estimate key galaxy parameters, such as stellar mass ($M_\star$) and star formation rate (SFR), leveraging data from a large and deep JWST/NIRCam and MIRI survey (PRIMER). As a test of the methodology, we focus on galaxies at $z\sim 1.5-2.5$. To mitigate discrepancies between input colors and the training set, we replace the default SOM weights with stacked SEDs from each cell, extending the applicability of our model to other photometric catalogs (e.g., COSMOS2020). By incorporating a SED library (SED Lib), we apply this JWST-calibrated model to the COSMOS2020 catalog. Despite the limited sample size and potential template-related uncertainties, SOM-derived parameters exhibit a good agreement with results from SED-fitting using extended photometry. Under identical photometric constraints from CSST and Euclid bands, our method outperforms traditional SED-fitting techniques in SFR estimation, exhibiting both a reduced bias (-0.01 vs. 0.18) and a smaller $σ_{\rm NMAD}$ (0.25 vs. 0.35). With its computational efficiency capable of processing $10^6$ sources per CPU per hour during the estimation phase, this JWST-calibrated estimator holds significant promise for next-generation wide-field surveys.

SHAPE. I. A SOM-SED hybrid approach for efficient galaxy parameter estimation leveraging JWST

TL;DR

SHAPE introduces a SOM-SED Hybrid Approach that leverages JWST PRIMER data to calibrate galaxy parameter estimation for upcoming wide-field surveys. By clustering galaxies with a SOM and constructing an SED Lib from cell-average templates, SHAPE extends parameter inference to diverse filter sets (e.g., COSMOS2020, CSST, Euclid) with a continuous, probabilistic mapping that preserves efficiency. It achieves near-SED-fitting accuracy for stellar mass and star formation rate (NMAD < 0.2 dex) while dramatically reducing computation time, and demonstrates improved SFR estimates under limited-band photometry. The approach promises scalable, cross-survey parameter estimation for next-generation surveys, while acknowledging limitations from training size, missing data, and redshift inference, and outlining paths for SHAPE II enhancements.

Abstract

With the launch and application of next-generation ground- and space-based telescopes, astronomy has entered the era of big data, necessitating more efficient and robust data analysis methods. Most traditional parameter estimation methods are unable to reconcile differences between photometric systems. Ideally, we would like to optimally rely on high-quality observation data provided by, e.g., JWST, for calibrating and improving upcoming wide-field surveys such as the China Space Station Telescope (CSST) and Euclid. To this end, we introduce a new approach (SHAPE, SOM-SED Hybrid Approach for efficient Parameter Estimation) that can bridge different photometric systems and efficiently estimate key galaxy parameters, such as stellar mass () and star formation rate (SFR), leveraging data from a large and deep JWST/NIRCam and MIRI survey (PRIMER). As a test of the methodology, we focus on galaxies at . To mitigate discrepancies between input colors and the training set, we replace the default SOM weights with stacked SEDs from each cell, extending the applicability of our model to other photometric catalogs (e.g., COSMOS2020). By incorporating a SED library (SED Lib), we apply this JWST-calibrated model to the COSMOS2020 catalog. Despite the limited sample size and potential template-related uncertainties, SOM-derived parameters exhibit a good agreement with results from SED-fitting using extended photometry. Under identical photometric constraints from CSST and Euclid bands, our method outperforms traditional SED-fitting techniques in SFR estimation, exhibiting both a reduced bias (-0.01 vs. 0.18) and a smaller (0.25 vs. 0.35). With its computational efficiency capable of processing sources per CPU per hour during the estimation phase, this JWST-calibrated estimator holds significant promise for next-generation wide-field surveys.

Paper Structure

This paper contains 24 sections, 9 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Schematic diagram of SHAPE model. This method employs a SOM to cluster galaxies in the training set and assigns each SOM cell a representative average SED. When the photometric filters of the test set match those of the training set, galaxies can be directly projected onto the SOM for parameter estimation. Otherwise, the galaxy is matched to the SED library (SED Lib) constructed from the SOM to determine its physical parameter.
  • Figure 2: Observational properties of the galaxy samples used throughout this work, selected according to the criteria in Sect. \ref{['subsec:sel']}. Left: Stellar mass distribution of the JWST training set (black), the COSMOS2020 test sample (blue) and the mock CSST+Euclid catalog (red dashed). Middle: SFR distribution of the three samples. Right: The magnitude distribution of COSMOS2020 sample, where we show the depths at $5\sigma$ for COSMOS2020 and CSST respectively.
  • Figure 3: Undersampling fraction (the fraction of SOM cells containing fewer than five galaxies; purple) and quantization error (the average $\Delta$ in Eq. \ref{['eq:distance']}; orange) as a function of SOM size, given a limited data volume of 7,507 galaxies. While the undersampling fraction increases monotonically with size, the quantization error first decreases and then increases, reaching an optimal value at a SOM size of $30\times30$.
  • Figure 4: SOM of JWST PRIMER galaxies at $z\sim 1.5-2.5$ selected as described in Sect \ref{['subsec:sel']}. Left: Number of galaxies per cell. Middle: Similarity between galaxies in a given cell and the corresponding SOM weight, quantified using Eq \ref{['eq:distance']}. Right: Similarity incorporating photometric errors. Compared to Davidzon_2022, the scatter is significantly reduced when applying the JWST catalog.
  • Figure 5: Label maps of normalized stellar mass and SFR. For each pair, the left panel displays the SOM grid color-coded by the mode of the corresponding parameter (stellar mass or SFR), while the right panel shows the distribution width, defined as the 84th–16th percentile range $\langle\delta\rangle = \langle84\%-16\%\rangle$, within each cell.
  • ...and 8 more figures