Table of Contents
Fetching ...

Identifying Anomalous DESI Galaxy Spectra with a Variational Autoencoder

C. Nicolaou, R. P. Nathan, O. Lahav, A. Palmese, A. Saintonge, J. Aguilar, S. Ahlen, C. Allende Prieto, S. Bailey, S. BenZvi, D. Bianchi, A. Brodzeller, D. Brooks, T. Claybaugh, A. de la Macorra, J. Della Costa, Arjun Dey, P. Doel, J. E. Forero-Romero, E. Gaztañaga, S. Gontcho A Gontcho, G. Gutierrez, K. Honscheid, C. Howlett, M. Ishak, R. Kehoe, D. Kirkby, T. Kisner, A. Kremin, A. Lambert, M. Landriau, L. Le Guillou, A. Meisner, R. Miquel, J. Moustakas, S. Nadathur, F. Prada, I. Pérez-Ràfols, G. Rossi, E. Sanchez, M. Schubnell, M. Siudek, D. Sprayberry, G. Tarlé, B. A. Weaver, H. Zou

TL;DR

This work shows how Machine Learning, in particular Variational Autoencoders (VAE), can detect anomalies in a sample of approximately 200,000 DESI spectra comprising galaxies, quasars and stars, and demonstrates that the VAE can compress the dimensionality of a spectrum by a factor of 100, while still retaining enough information to accurately reconstruct spectral features.

Abstract

The tens of millions of spectra being captured by the Dark Energy Spectroscopic Instrument (DESI) provide tremendous discovery potential. In this work we show how Machine Learning, in particular Variational Autoencoders (VAE), can detect anomalies in a sample of approximately 200,000 DESI spectra comprising galaxies, quasars and stars. We demonstrate that the VAE can compress the dimensionality of a spectrum by a factor of 100, while still retaining enough information to accurately reconstruct spectral features. We then detect anomalous spectra as those with high reconstruction error and those which are isolated in the VAE latent representation. The anomalies identified fall into two categories: spectra with artefacts and spectra with unique physical features. Awareness of the former can help to improve the DESI spectroscopic pipeline; whilst the latter can lead to the identification of new and unusual objects. To further curate the list of outliers, we use the Astronomaly package which employs Active Learning to provide personalised outlier recommendations for visual inspection. In this work we also explore the VAE latent space, finding that different object classes and subclasses are separated despite being unlabelled. We demonstrate the interpretability of this latent space by identifying tracks within it that correspond to various spectral characteristics. For example, we find tracks that correspond to increasing star formation and increase in broad emission lines along the Balmer series. In upcoming work we hope to apply the methods presented here to search for both systematics and astrophysically interesting objects in much larger datasets of DESI spectra.

Identifying Anomalous DESI Galaxy Spectra with a Variational Autoencoder

TL;DR

This work shows how Machine Learning, in particular Variational Autoencoders (VAE), can detect anomalies in a sample of approximately 200,000 DESI spectra comprising galaxies, quasars and stars, and demonstrates that the VAE can compress the dimensionality of a spectrum by a factor of 100, while still retaining enough information to accurately reconstruct spectral features.

Abstract

The tens of millions of spectra being captured by the Dark Energy Spectroscopic Instrument (DESI) provide tremendous discovery potential. In this work we show how Machine Learning, in particular Variational Autoencoders (VAE), can detect anomalies in a sample of approximately 200,000 DESI spectra comprising galaxies, quasars and stars. We demonstrate that the VAE can compress the dimensionality of a spectrum by a factor of 100, while still retaining enough information to accurately reconstruct spectral features. We then detect anomalous spectra as those with high reconstruction error and those which are isolated in the VAE latent representation. The anomalies identified fall into two categories: spectra with artefacts and spectra with unique physical features. Awareness of the former can help to improve the DESI spectroscopic pipeline; whilst the latter can lead to the identification of new and unusual objects. To further curate the list of outliers, we use the Astronomaly package which employs Active Learning to provide personalised outlier recommendations for visual inspection. In this work we also explore the VAE latent space, finding that different object classes and subclasses are separated despite being unlabelled. We demonstrate the interpretability of this latent space by identifying tracks within it that correspond to various spectral characteristics. For example, we find tracks that correspond to increasing star formation and increase in broad emission lines along the Balmer series. In upcoming work we hope to apply the methods presented here to search for both systematics and astrophysically interesting objects in much larger datasets of DESI spectra.

Paper Structure

This paper contains 21 sections, 6 equations, 18 figures, 1 table.

Figures (18)

  • Figure 1: Example architecture of a Variational Autoencoder (VAE). The encoder is tasked with mapping the input, $x$ onto the lower dimensional latent space, $z$, as a normal distribution described by two vectors containing the latent mean $\mu$ and latent standard deviation $\sigma$. The decoder is tasked with decompressing samples from the latent representation and producing reconstructions, $\hat{x}$, of the original data.
  • Figure 2: Original (black) and reconstructed (blue) spectrum based on 10 latent variables (in this and and subsequent figures) of an elliptical galaxy from the validation set. The median pixel S/N of the galaxy is $33$. The spectrum is in the rest frame. The dashed grey lines indicate typical spectral lines labelled at the top of the plot. Target ID: 39627836461419062.
  • Figure 4: Original (black) and reconstructed (blue) spectrum of an emission-line galaxy from the validation set. The median pixel S/N of the galaxy is $14$. The spectrum is in the rest frame. The dashed grey lines indicate typical spectral lines labelled at the top of the plot. Target ID: 39627974831507323.
  • Figure 6: Original (black) and reconstructed (blue) spectrum of an AGN from the validation set. The median pixel S/N of the galaxy is $17$. The spectrum is in the rest frame. The dashed grey lines indicate typical spectral lines labelled at the top of the plot. Target ID: 39633118658822194.
  • Figure 8: Original (black) and reconstructed (blue) spectrum of a star (M type) from the validation set. The median pixel S/N of the spectrum is $27$. The spectrum is in the rest frame. The dashed grey lines indicate typical spectral lines labelled at the top of the plot. Target ID: 39627582181739826.
  • ...and 13 more figures