Table of Contents
Fetching ...

COSMOS-Web: Estimating Physical Parameters of Galaxies Using Self-Organizing Maps

Fatemeh Abedini, Ghassem Gozaliasl, Akram Hasani Zonoozi, Atousa Kalantari, Maarit Korpi-Lagg, Olivier Ilbert, Hollis Akins, Natalie Allen, Rafael Arango-Toro, Caitlin Casey, Nicole Drakos, Andreas Faisst, Carter Flayhart, Maximilien Franco, Hosein Haghi, Aryana Haghjoo, Santosh Harish, Hossein Hatamnia, Jeyhan Kartaltepe, Ali Khostovan, Anton Koekemoer, Vasily Kokorev, Rebecca Larson, Gavin Leroy, Daizhong Liu, Henry McCracken, Jed McKinney, Nicolas McMahon, Wilfried Mercier, Bahram Mobasher, Sophie Newman, Louise Paquereau, Jason Rhodes, Brant Robertson, Sogol Sanjaripour, Marko Shuntov, Sina Taamoli, Sune Toft, Francesco Valentino, Eleni Vardoulaki, John Weaver

TL;DR

This work demonstrates that Self-Organizing Maps can infer key galaxy physical parameters—$z$, $M_*$, $ ext{SFR}$, $ ext{sSFR}$, and age$_{mw}$—from multiband photometry in COSMOS-Web, by training on both HORIZON-AGN mocks and CW data. The authors introduce a covariate-shift alignment to robustly transfer the SOM from simulation to observation, and they evaluate performance using NMAD, RMSE, and Pearson $r$ across redshift bins. On HZ-AGN, redshift, mass, and SFR predictions are accurate with strong correlations, while CW predictions show more degeneracy and scatter, particularly for redshift and age, though stellar mass remains relatively well constrained. When applying the HZ-AGN SOM to CW data, the covariate-alignment approach improves consistency, indicating SOMs can be a fast, interpretable alternative or complement to SED fitting for future large-volume surveys that include JWST bands. Overall, the study highlights both the promise and the challenges of SOM-based galaxy parameter estimation in the era of JWST-based photometry.

Abstract

The COSMOS-Web survey, with its unparalleled combination of multiband data, notably, near-infrared imaging from JWST's NIRCam (F115W, F150W, F277W, and F444W), provides a transformative dataset down to $\sim28$ mag (F444W) for studying galaxy evolution. In this work, we employ Self-Organizing Maps (SOMs), an unsupervised machine learning method, to estimate key physical parameters of galaxies -- redshift, stellar mass, star formation rate (SFR), specific SFR (sSFR), and age -- directly from photometric data out to $z=3.5$. SOMs efficiently project high-dimensional galaxy color information onto 2D maps, showing how physical properties vary among galaxies with similar spectral energy distributions. We first validate our approach using mock galaxy catalogs from the HORIZON-AGN simulation, where the SOM accurately recovers the true parameters, demonstrating its robustness. Applying the method to COSMOS-Web observations, we find that the SOM delivers robust estimates despite the increased complexity of real galaxy populations. Performance metrics ($σ_{\mathrm{NMAD}}$ typically between $0.1$--$0.3$, and Pearson correlation between $0.7$ and $0.9$) confirm the precision of the method, with $\sim$ $70\%$ of predictions within 1$σ$ dex of reference values. Although redshift estimation in COSMOS-Web remains challenging (median $σ_{\mathrm{NMAD}} = 0.04$), the overall success of the highlights its potential as a powerful and interpretable tool for galaxy parameter estimation.

COSMOS-Web: Estimating Physical Parameters of Galaxies Using Self-Organizing Maps

TL;DR

This work demonstrates that Self-Organizing Maps can infer key galaxy physical parameters—, , , , and age—from multiband photometry in COSMOS-Web, by training on both HORIZON-AGN mocks and CW data. The authors introduce a covariate-shift alignment to robustly transfer the SOM from simulation to observation, and they evaluate performance using NMAD, RMSE, and Pearson across redshift bins. On HZ-AGN, redshift, mass, and SFR predictions are accurate with strong correlations, while CW predictions show more degeneracy and scatter, particularly for redshift and age, though stellar mass remains relatively well constrained. When applying the HZ-AGN SOM to CW data, the covariate-alignment approach improves consistency, indicating SOMs can be a fast, interpretable alternative or complement to SED fitting for future large-volume surveys that include JWST bands. Overall, the study highlights both the promise and the challenges of SOM-based galaxy parameter estimation in the era of JWST-based photometry.

Abstract

The COSMOS-Web survey, with its unparalleled combination of multiband data, notably, near-infrared imaging from JWST's NIRCam (F115W, F150W, F277W, and F444W), provides a transformative dataset down to mag (F444W) for studying galaxy evolution. In this work, we employ Self-Organizing Maps (SOMs), an unsupervised machine learning method, to estimate key physical parameters of galaxies -- redshift, stellar mass, star formation rate (SFR), specific SFR (sSFR), and age -- directly from photometric data out to . SOMs efficiently project high-dimensional galaxy color information onto 2D maps, showing how physical properties vary among galaxies with similar spectral energy distributions. We first validate our approach using mock galaxy catalogs from the HORIZON-AGN simulation, where the SOM accurately recovers the true parameters, demonstrating its robustness. Applying the method to COSMOS-Web observations, we find that the SOM delivers robust estimates despite the increased complexity of real galaxy populations. Performance metrics ( typically between --, and Pearson correlation between and ) confirm the precision of the method, with of predictions within 1 dex of reference values. Although redshift estimation in COSMOS-Web remains challenging (median ), the overall success of the highlights its potential as a powerful and interpretable tool for galaxy parameter estimation.

Paper Structure

This paper contains 31 sections, 25 equations, 44 figures, 3 tables.

Figures (44)

  • Figure 1: Workflow for estimating galaxy physical properties using SOMs. The process involves two datasets: HZ-AGN simulations and CW observations. After initial preprocessing, including feature selection, magnitude cuts, and splitting the data into training and testing sets, SOMs are trained separately in four redshift bins. The SOM trained on the HZ-AGN data is applied to both the HZ-AGN and CW data, while the SOM trained on the CW data is applied to the CW data. Physical parameters such as redshift, stellar mass, SFR, sSFR, and mass-weighted age are estimated from the peak of the posterior PDF constructed using likelihood-weighted SOM cells.
  • Figure 2: Comparison of the HZ-AGN catalog and the CW catalog in the redshift range $0.1 \leq z < 3.5$. Upper panel: Magnitude-redshift distribution in F444W magnitudes. The dashed lines show the magnitude completeness for the simulation and observational data. Lower panel: Redshift distribution of the simulation and observational data. The dashed lines represent the simulation and observational data with $m_\mathrm{F444W} < 24.8$ and $m_\mathrm{F444W} < 27$, respectively. The dot-dashed line represents the $m_\mathrm{F444W} < 24.8$ magnitude cut applied to the observational data prior to applying it to the SOM trained on simulation data.
  • Figure 3: Quality score for different grid sizes, ranging from $10\times10$ to $100\times100$, for the SOM trained on simulation data. The optimal grid sizes are as follows: $70\times40$ for the redshift bin $0.1$–$0.8$, $60\times60$ for $0.8$–$1.5$ and $1.5$–$2.5$, and $40\times30$ for $2.5$–$3.5$.
  • Figure 4: Quality score for different grid sizes, ranging from $10\times10$ to $100\times100$, for the SOM trained on observational data. The optimal grid sizes are as follows: $50\times60$ for the redshift bin $0.1$–$0.8$, $50\times80$ for $0.8$–$1.5$, $60\times60$ for $1.5$–$2.5$, and $40\times40$ for $2.5$–$3.5$.
  • Figure 5: Interpolated distribution of 10-color features of the training data in the redshift bin $0.1$ to $0.8$, projected onto the SOM trained on HZ-AGN galaxies. Color ranges are normalized independently for each color, with minimum and maximum values set to the $2$nd and $98$th percentiles of each color's distribution respectively.
  • ...and 39 more figures