Data-Driven Stellar Spectral Modelling with GSPICE
Douglas P. Finkbeiner, Joshua S. Speagle, Tanveer Karim
Abstract
Spectral data reduction pipelines deal with a wide variety of challenges including masking cosmic rays, calibrating wavelength solutions, and estimating background noise while trying to remain model-agnostic. Traditional methods rely on hardware-specific code or pre-calculated stellar model templates to solve this problem, making them model-dependent and not suitable for large datasets that may contain new classes of objects. To solve this problem, we present a flexible, data-driven method: the GausSian PIxelwise Conditional Estimator (GSPICE) that models an ensemble of spectra as a multivariate Gaussian and estimates the expected value and expected variance of each pixel in each spectrum conditional on others. GSPICE compares observed fluxes and errors to its own flux and error estimates to reveal outliers, which then can be completely masked or replaced by their estimates. We apply GSPICE to 3.9 million stellar spectra from the LAMOST survey, and show that variations of the method can directly identify and correct both individual pixel-level outliers (e.g., from cosmic ray hits) as well as extended systematic features (e.g., from incorrect wavelength calibrations), while still providing a novel characterization of the true per-pixel measurement uncertainties. We also demonstrate how GSPICE can take advantage of data partitioning with an application to diffuse interstellar bands. Implementations of GSPICE in both Python and IDL can be found here http://github.com/dfink/gspice.
