Table of Contents
Fetching ...

Stellar Karaoke: deep blind separation of terrestrial atmospheric effects out of stellar spectra by velocity whitening

Nima Sedaghat, Brianna M. Smart, J. Bryce Kalmbach, Erin L. Howard, Hamidreza Amindavar

TL;DR

The ‘Stellar Karaoke’ approach, which has zero need for prior knowledge about parameters such as observation time, location, or the distribution of atmospheric molecules and processes each spectrum in milliseconds, is reported.

Abstract

We report a study exploring how the use of deep neural networks with astronomical Big Data may help us find and uncover new insights into underlying phenomena: through our experiments towards unsupervised knowledge extraction from astronomical Big Data we serendipitously found that deep convolutional autoencoders tend to reject telluric lines in stellar spectra. With further experiments we found that only when the spectra are in the barycentric frame does the network automatically identify the statistical independence between two components, stellar vs telluric, and rejects the latter. We exploit this finding and turn it into a proof-of-concept method for removal of the telluric lines from stellar spectra in a fully unsupervised fashion: we increase the inter-observation entropy of telluric absorption lines by imposing a random, virtual radial velocity to the observed spectrum. This technique results in a non-standard form of ``whitening'' in the atmospheric components of the spectrum, decorrelating them across multiple observations. We process more than 250,000 spectra from the High Accuracy Radial velocity Planetary Search (HARPS) and with qualitative and quantitative evaluations against a database of known telluric lines, show that most of the telluric lines are successfully rejected. Our approach, `Stellar Karaoke', has zero need for prior knowledge about parameters such as observation time, location, or the distribution of atmospheric molecules and processes each spectrum in milliseconds. We also train and test on Sloan Digital Sky Survey (SDSS) and see a significant performance drop due to the low resolution. We discuss directions for developing tools on top of the introduced method in the future.

Stellar Karaoke: deep blind separation of terrestrial atmospheric effects out of stellar spectra by velocity whitening

TL;DR

The ‘Stellar Karaoke’ approach, which has zero need for prior knowledge about parameters such as observation time, location, or the distribution of atmospheric molecules and processes each spectrum in milliseconds, is reported.

Abstract

We report a study exploring how the use of deep neural networks with astronomical Big Data may help us find and uncover new insights into underlying phenomena: through our experiments towards unsupervised knowledge extraction from astronomical Big Data we serendipitously found that deep convolutional autoencoders tend to reject telluric lines in stellar spectra. With further experiments we found that only when the spectra are in the barycentric frame does the network automatically identify the statistical independence between two components, stellar vs telluric, and rejects the latter. We exploit this finding and turn it into a proof-of-concept method for removal of the telluric lines from stellar spectra in a fully unsupervised fashion: we increase the inter-observation entropy of telluric absorption lines by imposing a random, virtual radial velocity to the observed spectrum. This technique results in a non-standard form of ``whitening'' in the atmospheric components of the spectrum, decorrelating them across multiple observations. We process more than 250,000 spectra from the High Accuracy Radial velocity Planetary Search (HARPS) and with qualitative and quantitative evaluations against a database of known telluric lines, show that most of the telluric lines are successfully rejected. Our approach, `Stellar Karaoke', has zero need for prior knowledge about parameters such as observation time, location, or the distribution of atmospheric molecules and processes each spectrum in milliseconds. We also train and test on Sloan Digital Sky Survey (SDSS) and see a significant performance drop due to the low resolution. We discuss directions for developing tools on top of the introduced method in the future.
Paper Structure (27 sections, 22 equations, 12 figures)

This paper contains 27 sections, 22 equations, 12 figures.

Figures (12)

  • Figure 1: We exploit the statistical properties of stellar spectra in large datasets, pass them through a convolutional autoencoder, and get telluric lines rejected with minimal effort. The three red arrows in the figure highlight spectral lines that the autoencoder identified as telluric lines and did not include in the reconstructed spectrum.
  • Figure 2: A typical variational autoencoder trained to reconstruct stellar spectra, can decompose physically meaningful components out of the input, when the compression is high enough and the number of convolutional kernels is kept low. The variational implementation of the bottleneck is omitted for the sake of simplicity. The main contraction and expansion parts on the left and right are composed of (up)convolutional layers, while fully connected layers are necessary at the bottleneck. Blue is the input spectrum and orange is the reconstructed version. This image has originally been published in sedaghat2021machines.
  • Figure 3: On the left a set of exemplar spectra are depicted. Stellar lines are unaligned due to different radial velocities. But telluric lines are aligned, even though they may have different shapes due to their inherent time dependence. On the right the same spectra after velocity randomisation are depicted. Telluric lines are now unaligned too, but with a pattern different to that of stellar lines. The red and green dotted vertical lines indicate the location of exemplar telluric and stellar lines, respectively. More telluric lines can be seen to the right of the marked one.
  • Figure 4: Visual illustration of the covariance matrix of the ensemble of signals, before and after whitening. On the left, the covariance matrix of the [6275, 6285] Å region for some subset of size 90 of the observations is visualised. On the right, the same is done after velocity whitening, where $v_i$ were sampled from the uniform distribution: $V \sim U(-30\text{ km/s, }30\text{ km/s})$. The covariance matrix is closer to the identity matrix now, confirming achievement of some degree of whitening/decorrelation.
  • Figure 5: Qualitative illustration of the results on an exemplar selection of HARPS spectra. Each row depicts one spectrum, with its HARPS ID written on the right, while columns focus on different regions of interest (indicated as grey regions on the left). Interpretation hint: reconstruction (orange) should follow the pseudo-truth (green -- see \ref{['sec:quant_eval']} for definition). The left-most column covers the whole spectrum, as is fed into the network, and illustrates the robustness of the network to different characteristics of the spectra (continuum, noise level, etc.). Major stellar lines such as $H\alpha$ can be easily spotted in the less 'busy' examples. In the middle column we zoom in on a potentially complex region where narrow stellar lines, when existing, can collide with telluric lines of similar shapes. E.g. in the second and 4th row there are examples of such cases, where the network rejects the telluric component, while still preserving the stellar part very well -- thus rejecting the hypothesis that it might be simply rejecting narrow lines by a applying a moving average. In the third column, we focus on a region where stellar lines occur in some of the spectra, but not in the others. This way we reject the hypothesis that the network might have 'memorised' the locations of the lines.
  • ...and 7 more figures