Unsupervised Discovery of High-Redshift Galaxy Populations with Variational Autoencoders
Aayush Saxena
TL;DR
This work tackles the problem of discovering rare high-redshift galaxy populations from JWST spectra without labeled data by applying a variational autoencoder to map spectra into a 16‑dimensional latent space, followed by UMAP visualization and Gaussian mixture clustering. The approach yields 12 well-separated clusters corresponding to five astrophysically interesting classes, substantially increasing known samples (e.g., 326 quenched galaxies, 213 LAEs, 180 EELGs, 320 High-z analogs, 142 LRDs) and enabling automated interpretation of spectral features. The findings demonstrate the power of unsupervised representation learning for large spectroscopic surveys and provide a scalable pathway to rapid discovery and characterization of early-universe galaxy populations. This framework can be integrated into JWST data pipelines to accelerate statistical studies of galaxy formation and evolution in the first 1.5 billion years after the Big Bang.
Abstract
We apply variational autoencoders to automatically discover galaxy populations using publicly available high-redshift \textit{JWST} spectra without prior classification knowledge. Our unsupervised method identifies distinct astrophysical classes of unique and exciting galaxy types, demonstrating automated discovery capabilities for large spectroscopic surveys.
