Emergent Denoising of SDSS Galaxy Spectra Through Unsupervised Deep Learning
Oliver Camilleri, Zahra Sharbaf, Ignacio Ferreras
TL;DR
This work tackles the problem of low-$S/N$ galaxy absorption spectra by proposing an unsupervised deep-learning denoising approach trained on a large SDSS Legacy ensemble. It compares a classical Butterworth baseline with four DL autoencoder variants (FS, NW, NW-S, CS), using a MAE loss and evaluating on three key line-strength indices within two spectral windows. The main finding is that a full-spectrum autoencoder (FS) yields the most faithful reconstructions with higher $S/N$ while avoiding biases, whereas CS and narrow-window variants can underperform, and BF can overfit. The study also uses SHAP explainability to reveal that emission lines and blue continuum regions drive the model, highlighting the continuum's important role and suggesting practical benefits for upcoming surveys such as DESI, WEAVE, and WAVES.
Abstract
Spectroscopy represents the ideal observational method to maximally extract information from galaxies regarding their star formation and chemical enrichment histories. However, absorption spectra of galaxies prove rather challenging at high redshift or in low mass galaxies, due to the need to spread the photons into a relatively large set of spectral bins. For this reason, the data from many state-of-the-art spectroscopic surveys suffer from low signal-to-noise (S/N) ratios, and prevent accurate estimates of the stellar population parameters. In this paper, we tackle the issue of denoising an ensemble by the use of unsupervised Deep Learning techniques trained on a homogeneous sample of spectra over a wide range of S/N. These methods reconstruct spectra at a higher S/N and allow us to investigate the potential for Deep Learning to faithfully reproduce spectra from incomplete data. Our methodology is tested on three key line strengths and is compared with synthetic data to assess retrieval biases. The results suggest a standard Autoencoder as a very powerful method that does not introduce systematics in the reconstruction. We also note in this work how careful the analysis needs to be, as other methods can -- on a quick check -- produce spectra that appear noiseless but are in fact strongly biased towards a simple overfitting of the noisy input. Denoising methods with minimal bias will maximise the quality of ongoing and future spectral surveys such as DESI, WEAVE, or WAVES.
