Table of Contents
Fetching ...

Unsupervised Discovery of High-Redshift Galaxy Populations with Variational Autoencoders

Aayush Saxena

TL;DR

This work tackles the problem of discovering rare high-redshift galaxy populations from JWST spectra without labeled data by applying a variational autoencoder to map spectra into a 16‑dimensional latent space, followed by UMAP visualization and Gaussian mixture clustering. The approach yields 12 well-separated clusters corresponding to five astrophysically interesting classes, substantially increasing known samples (e.g., 326 quenched galaxies, 213 LAEs, 180 EELGs, 320 High-z analogs, 142 LRDs) and enabling automated interpretation of spectral features. The findings demonstrate the power of unsupervised representation learning for large spectroscopic surveys and provide a scalable pathway to rapid discovery and characterization of early-universe galaxy populations. This framework can be integrated into JWST data pipelines to accelerate statistical studies of galaxy formation and evolution in the first 1.5 billion years after the Big Bang.

Abstract

We apply variational autoencoders to automatically discover galaxy populations using publicly available high-redshift \textit{JWST} spectra without prior classification knowledge. Our unsupervised method identifies distinct astrophysical classes of unique and exciting galaxy types, demonstrating automated discovery capabilities for large spectroscopic surveys.

Unsupervised Discovery of High-Redshift Galaxy Populations with Variational Autoencoders

TL;DR

This work tackles the problem of discovering rare high-redshift galaxy populations from JWST spectra without labeled data by applying a variational autoencoder to map spectra into a 16‑dimensional latent space, followed by UMAP visualization and Gaussian mixture clustering. The approach yields 12 well-separated clusters corresponding to five astrophysically interesting classes, substantially increasing known samples (e.g., 326 quenched galaxies, 213 LAEs, 180 EELGs, 320 High-z analogs, 142 LRDs) and enabling automated interpretation of spectral features. The findings demonstrate the power of unsupervised representation learning for large spectroscopic surveys and provide a scalable pathway to rapid discovery and characterization of early-universe galaxy populations. This framework can be integrated into JWST data pipelines to accelerate statistical studies of galaxy formation and evolution in the first 1.5 billion years after the Big Bang.

Abstract

We apply variational autoencoders to automatically discover galaxy populations using publicly available high-redshift \textit{JWST} spectra without prior classification knowledge. Our unsupervised method identifies distinct astrophysical classes of unique and exciting galaxy types, demonstrating automated discovery capabilities for large spectroscopic surveys.

Paper Structure

This paper contains 10 sections, 2 equations, 2 figures.

Figures (2)

  • Figure 1: Comparisons of input (blue) and reconstructed (orange) spectra, drawn from four quartiles of reconstruction errors distribution, with decreasing accuracy clockwise from top-left. We note that the reconstruction often makes predictions for when the input data is missing/masked.
  • Figure 2: Left: Observed, median-combined spectra of five exciting high redshift galaxy types identified using our VAE and clustering approach. The diversity of the galaxy spectra demonstrates the various physical processes that shape the continua and emission lines, enabling insights into galaxy evolution. Right: The redshift distribution of the clusters indicating a redshift correlation between galaxy types identified naturally by the VAE.