cecilia: A Machine Learning-Based Pipeline for Measuring Metal Abundances of Helium-rich Polluted White Dwarfs
M. Badenas-Agusti, J. Viaña, A. Vanderburg, S. Blouin, P. Dufour, S. Xu, L. Sha
TL;DR
cecilia addresses the scalability bottleneck in polluted white dwarf spectroscopy by learning a neural interpolator over a high-dimensional label space to rapidly generate synthetic spectra and retrieve 13 stellar parameters, including 11 metal abundances, for He-rich intermediate-temperature WDs. The pipeline combines an Autoencoder, FCNN1, and a fine-tuned FT FCNN2 to produce high-fidelity spectral predictions, followed by a fast Levenberg–Marquardt fit and a Bayesian MCMC to obtain robust posteriors. It achieves retrievals with typical uncertainties $\lesssim$0.1 dex for up to 10 metals on synthetic data and demonstrates practical applicability by fitting the SDSS spectrum of WD 1232+563, yielding abundances in agreement with literature and highlighting correlations among parameters. This approach enables population-scale studies of exoplanetary debris in the era of big data by dramatically speeding up analysis and providing principled uncertainty quantification and degeneracy visualization.
Abstract
Over the past several decades, conventional spectral analysis techniques of polluted white dwarfs have become powerful tools to learn about the geology and chemistry of extrasolar bodies. Despite their proven capabilities and extensive legacy of scientific discoveries, these techniques are however still limited by their manual, time-intensive, and iterative nature. As a result, they are susceptible to human errors and are difficult to scale up to population-wide studies of metal pollution. This paper seeks to address this problem by presenting cecilia, the first Machine Learning (ML)-powered spectral modeling code designed to measure the metal abundances of intermediate-temperature (10,000$\leq T_{\rm eff} \leq$20,000 K), Helium-rich polluted white dwarfs. Trained with more than 22,000 randomly drawn atmosphere models and stellar parameters, our pipeline aims to overcome the limitations of classical methods by replacing the generation of synthetic spectra from computationally expensive codes and uniformly spaced model grids, with a fast, automated, and efficient neural-network-based interpolator. More specifically, cecilia combines state-of-the-art atmosphere models, powerful artificial intelligence tools, and robust statistical techniques to rapidly generate synthetic spectra of polluted white dwarfs in high-dimensional space, and enable accurate ($\lesssim$0.1 dex) and simultaneous measurements of 14 stellar parameters -- including 11 elemental abundances -- from real spectroscopic observations. As massively multiplexed astronomical surveys begin scientific operations, cecilia's performance has the potential to unlock large-scale studies of extrasolar geochemistry and propel the field of white dwarf science into the era of Big Data. In doing so, we aspire to uncover new statistical insights that were previously impractical with traditional white dwarf characterisation techniques.
