Table of Contents
Fetching ...

Emergent Denoising of SDSS Galaxy Spectra Through Unsupervised Deep Learning

Oliver Camilleri, Zahra Sharbaf, Ignacio Ferreras

TL;DR

This work tackles the problem of low-$S/N$ galaxy absorption spectra by proposing an unsupervised deep-learning denoising approach trained on a large SDSS Legacy ensemble. It compares a classical Butterworth baseline with four DL autoencoder variants (FS, NW, NW-S, CS), using a MAE loss and evaluating on three key line-strength indices within two spectral windows. The main finding is that a full-spectrum autoencoder (FS) yields the most faithful reconstructions with higher $S/N$ while avoiding biases, whereas CS and narrow-window variants can underperform, and BF can overfit. The study also uses SHAP explainability to reveal that emission lines and blue continuum regions drive the model, highlighting the continuum's important role and suggesting practical benefits for upcoming surveys such as DESI, WEAVE, and WAVES.

Abstract

Spectroscopy represents the ideal observational method to maximally extract information from galaxies regarding their star formation and chemical enrichment histories. However, absorption spectra of galaxies prove rather challenging at high redshift or in low mass galaxies, due to the need to spread the photons into a relatively large set of spectral bins. For this reason, the data from many state-of-the-art spectroscopic surveys suffer from low signal-to-noise (S/N) ratios, and prevent accurate estimates of the stellar population parameters. In this paper, we tackle the issue of denoising an ensemble by the use of unsupervised Deep Learning techniques trained on a homogeneous sample of spectra over a wide range of S/N. These methods reconstruct spectra at a higher S/N and allow us to investigate the potential for Deep Learning to faithfully reproduce spectra from incomplete data. Our methodology is tested on three key line strengths and is compared with synthetic data to assess retrieval biases. The results suggest a standard Autoencoder as a very powerful method that does not introduce systematics in the reconstruction. We also note in this work how careful the analysis needs to be, as other methods can -- on a quick check -- produce spectra that appear noiseless but are in fact strongly biased towards a simple overfitting of the noisy input. Denoising methods with minimal bias will maximise the quality of ongoing and future spectral surveys such as DESI, WEAVE, or WAVES.

Emergent Denoising of SDSS Galaxy Spectra Through Unsupervised Deep Learning

TL;DR

This work tackles the problem of low- galaxy absorption spectra by proposing an unsupervised deep-learning denoising approach trained on a large SDSS Legacy ensemble. It compares a classical Butterworth baseline with four DL autoencoder variants (FS, NW, NW-S, CS), using a MAE loss and evaluating on three key line-strength indices within two spectral windows. The main finding is that a full-spectrum autoencoder (FS) yields the most faithful reconstructions with higher while avoiding biases, whereas CS and narrow-window variants can underperform, and BF can overfit. The study also uses SHAP explainability to reveal that emission lines and blue continuum regions drive the model, highlighting the continuum's important role and suggesting practical benefits for upcoming surveys such as DESI, WEAVE, and WAVES.

Abstract

Spectroscopy represents the ideal observational method to maximally extract information from galaxies regarding their star formation and chemical enrichment histories. However, absorption spectra of galaxies prove rather challenging at high redshift or in low mass galaxies, due to the need to spread the photons into a relatively large set of spectral bins. For this reason, the data from many state-of-the-art spectroscopic surveys suffer from low signal-to-noise (S/N) ratios, and prevent accurate estimates of the stellar population parameters. In this paper, we tackle the issue of denoising an ensemble by the use of unsupervised Deep Learning techniques trained on a homogeneous sample of spectra over a wide range of S/N. These methods reconstruct spectra at a higher S/N and allow us to investigate the potential for Deep Learning to faithfully reproduce spectra from incomplete data. Our methodology is tested on three key line strengths and is compared with synthetic data to assess retrieval biases. The results suggest a standard Autoencoder as a very powerful method that does not introduce systematics in the reconstruction. We also note in this work how careful the analysis needs to be, as other methods can -- on a quick check -- produce spectra that appear noiseless but are in fact strongly biased towards a simple overfitting of the noisy input. Denoising methods with minimal bias will maximise the quality of ongoing and future spectral surveys such as DESI, WEAVE, or WAVES.

Paper Structure

This paper contains 8 sections, 5 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: A 9-6-3-6-9 autoencoding network with an added skip connection between two hidden layer neurons. The input is embedded within a learned space of lower-dimensionality known as a latent space. From this representation, the input is reconstructed in the output layer.
  • Figure 2: Residual statistic ($\Delta$) estimated at S/N=5 for the reconstruction of SDSS galaxy spectra (left) and synthetic data (right). The horizontal dashed line represents the residual $\Delta$ for the comparison between the observed data and the best fit spectra. See text for details.
  • Figure 3: Standard deviation of the residuals of three line strengths, as labelled, showed with respect to the S/N in the SDSS-$g$ band (left) and the actual line measurement (right). They correspond to synthetic data with added noise (see text for details). The comparisons are made between the recovered spectra and the original, noiseless data ($\Delta_O$, top) or the noisy input ($\Delta_N$, bottom).
  • Figure 4: The limitations of the CS model are illustrated using the mean errors for each wavelength. To produce this plot, the absolute differences between model reconstructions and corresponding ground truth SDSS test set spectra were computed and then averaged.
  • Figure 5: Example of the recovery of a spectrum with original S/N$\sim$5. Note how the Butterworth Filter (BF) optimises the residuals with respect to the input (noisy) data, i.e. overfits, whereas FS improves the residuals with respect to the original (noiseless) spectra.
  • ...and 4 more figures