Stellar parameter prediction and spectral simulation using machine learning
Vojtěch Cvrček, Martino Romaniello, Radim Šára, Wolfram Freudling, Pascal Ballester
TL;DR
This work addresses the need for fast, accurate extraction of stellar parameters from high-resolution spectra while simultaneously enabling realistic spectral simulations. By combining supervised and semi-supervised autoencoder architectures with a physics-informed spectral simulator, the authors achieve mean Teff errors around 50 K and metallicity/log g precisions near 0.02–0.04 dex, while dramatically reducing per-spectrum processing time to the millisecond regime on GPUs. The approach leverages a semi-supervised latent space that separates label-informed and unknown factors, and employs novel generative metrics (RVIS and GIS) to quantify cause-and-effect fidelity in spectral generation. The results show that label-aware models can rival traditional methods in accuracy and scale efficiently to massive surveys, with simulated data providing meaningful benefits when labeled data are sparse, marking a practical path toward high-throughput spectroscopic analyses.
Abstract
We applied machine learning to the entire data history of ESO's High Accuracy Radial Velocity Planet Searcher (HARPS) instrument. Our primary goal was to recover the physical properties of the observed objects, with a secondary emphasis on simulating spectra. We systematically investigated the impact of various factors on the accuracy and fidelity of the results, including the use of simulated data, the effect of varying amounts of real training data, network architectures, and learning paradigms. Our approach integrates supervised and unsupervised learning techniques within autoencoder frameworks. Our methodology leverages an existing simulation model that utilizes a library of existing stellar spectra in which the emerging flux is computed from first principles rooted in physics and a HARPS instrument model to generate simulated spectra comparable to observational data. We trained standard and variational autoencoders on HARPS data to predict spectral parameters and generate spectra. Our models excel at predicting spectral parameters and compressing real spectra, and they achieved a mean prediction error of approximately 50 K for effective temperatures, making them relevant for most astrophysical applications. Furthermore, the models predict metallicity ([M/H]) and surface gravity (log g) with an accuracy of approximately 0.03 dex and 0.04 dex, respectively, underscoring their broad applicability in astrophysical research. The models' computational efficiency, with processing times of 779.6 ms on CPU and 3.97 ms on GPU, makes them valuable for high-throughput applications like massive spectroscopic surveys and large archival studies. By achieving accuracy comparable to classical methods with significantly reduced computation time, our methodology enhances the scope and efficiency of spectroscopic analysis.
