Table of Contents
Fetching ...

cecilia: A Machine Learning-Based Pipeline for Measuring Metal Abundances of Helium-rich Polluted White Dwarfs

M. Badenas-Agusti, J. Viaña, A. Vanderburg, S. Blouin, P. Dufour, S. Xu, L. Sha

TL;DR

cecilia addresses the scalability bottleneck in polluted white dwarf spectroscopy by learning a neural interpolator over a high-dimensional label space to rapidly generate synthetic spectra and retrieve 13 stellar parameters, including 11 metal abundances, for He-rich intermediate-temperature WDs. The pipeline combines an Autoencoder, FCNN1, and a fine-tuned FT FCNN2 to produce high-fidelity spectral predictions, followed by a fast Levenberg–Marquardt fit and a Bayesian MCMC to obtain robust posteriors. It achieves retrievals with typical uncertainties $\lesssim$0.1 dex for up to 10 metals on synthetic data and demonstrates practical applicability by fitting the SDSS spectrum of WD 1232+563, yielding abundances in agreement with literature and highlighting correlations among parameters. This approach enables population-scale studies of exoplanetary debris in the era of big data by dramatically speeding up analysis and providing principled uncertainty quantification and degeneracy visualization.

Abstract

Over the past several decades, conventional spectral analysis techniques of polluted white dwarfs have become powerful tools to learn about the geology and chemistry of extrasolar bodies. Despite their proven capabilities and extensive legacy of scientific discoveries, these techniques are however still limited by their manual, time-intensive, and iterative nature. As a result, they are susceptible to human errors and are difficult to scale up to population-wide studies of metal pollution. This paper seeks to address this problem by presenting cecilia, the first Machine Learning (ML)-powered spectral modeling code designed to measure the metal abundances of intermediate-temperature (10,000$\leq T_{\rm eff} \leq$20,000 K), Helium-rich polluted white dwarfs. Trained with more than 22,000 randomly drawn atmosphere models and stellar parameters, our pipeline aims to overcome the limitations of classical methods by replacing the generation of synthetic spectra from computationally expensive codes and uniformly spaced model grids, with a fast, automated, and efficient neural-network-based interpolator. More specifically, cecilia combines state-of-the-art atmosphere models, powerful artificial intelligence tools, and robust statistical techniques to rapidly generate synthetic spectra of polluted white dwarfs in high-dimensional space, and enable accurate ($\lesssim$0.1 dex) and simultaneous measurements of 14 stellar parameters -- including 11 elemental abundances -- from real spectroscopic observations. As massively multiplexed astronomical surveys begin scientific operations, cecilia's performance has the potential to unlock large-scale studies of extrasolar geochemistry and propel the field of white dwarf science into the era of Big Data. In doing so, we aspire to uncover new statistical insights that were previously impractical with traditional white dwarf characterisation techniques.

cecilia: A Machine Learning-Based Pipeline for Measuring Metal Abundances of Helium-rich Polluted White Dwarfs

TL;DR

cecilia addresses the scalability bottleneck in polluted white dwarf spectroscopy by learning a neural interpolator over a high-dimensional label space to rapidly generate synthetic spectra and retrieve 13 stellar parameters, including 11 metal abundances, for He-rich intermediate-temperature WDs. The pipeline combines an Autoencoder, FCNN1, and a fine-tuned FT FCNN2 to produce high-fidelity spectral predictions, followed by a fast Levenberg–Marquardt fit and a Bayesian MCMC to obtain robust posteriors. It achieves retrievals with typical uncertainties 0.1 dex for up to 10 metals on synthetic data and demonstrates practical applicability by fitting the SDSS spectrum of WD 1232+563, yielding abundances in agreement with literature and highlighting correlations among parameters. This approach enables population-scale studies of exoplanetary debris in the era of big data by dramatically speeding up analysis and providing principled uncertainty quantification and degeneracy visualization.

Abstract

Over the past several decades, conventional spectral analysis techniques of polluted white dwarfs have become powerful tools to learn about the geology and chemistry of extrasolar bodies. Despite their proven capabilities and extensive legacy of scientific discoveries, these techniques are however still limited by their manual, time-intensive, and iterative nature. As a result, they are susceptible to human errors and are difficult to scale up to population-wide studies of metal pollution. This paper seeks to address this problem by presenting cecilia, the first Machine Learning (ML)-powered spectral modeling code designed to measure the metal abundances of intermediate-temperature (10,00020,000 K), Helium-rich polluted white dwarfs. Trained with more than 22,000 randomly drawn atmosphere models and stellar parameters, our pipeline aims to overcome the limitations of classical methods by replacing the generation of synthetic spectra from computationally expensive codes and uniformly spaced model grids, with a fast, automated, and efficient neural-network-based interpolator. More specifically, cecilia combines state-of-the-art atmosphere models, powerful artificial intelligence tools, and robust statistical techniques to rapidly generate synthetic spectra of polluted white dwarfs in high-dimensional space, and enable accurate (0.1 dex) and simultaneous measurements of 14 stellar parameters -- including 11 elemental abundances -- from real spectroscopic observations. As massively multiplexed astronomical surveys begin scientific operations, cecilia's performance has the potential to unlock large-scale studies of extrasolar geochemistry and propel the field of white dwarf science into the era of Big Data. In doing so, we aspire to uncover new statistical insights that were previously impractical with traditional white dwarf characterisation techniques.
Paper Structure (22 sections, 16 equations, 13 figures, 4 tables)

This paper contains 22 sections, 16 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Parameters and operations associated to a single "neuron," which represents the smallest unit of a neural network.
  • Figure 2: A selection of 6 atmosphere models for synthetic He-rich polluted white dwarfs with effective temperatures between 10,000 K and 20,000 K, and surface gravities between 7 cgs and 9 cgs. Cool and hot white dwarfs are shown, respectively, in red and blue.
  • Figure 3: Effect of applying the Pixel-Wise (PW) normalisation technique on a synthetic spectrum in the spectral window between 3,800 Å to 4,000 Å. This technique is designed to improve cecilia's learning capabilities by amplifying any small variations in the stellar flux and attenuating the largest fluctuations. Panels (a) and (b) show, respectively, the raw and PW-normalised synthetic flux of the spectrum. The inset plots show the region between 3,330 Å and 3,350 Å.
  • Figure 4: Mean Absolute Error (MAE) statistic (or loss function) for the Autoencoder, FCNN1, and FT FCNN2, averaged for the 29 windows of 200 Å used to train cecilia. The training and validation sets are shown in blue and red. The dotted and dashed lines denote an average MAE of 0.01 and 0.001, respectively.
  • Figure 5: Top Panel: Synthetic spectrum corresponding to a white dwarf with $T_{\rm eff}$=18,301 K and $\log\rm g$=8.5 cgs. To allow for a closer examination, we only show the wavelength range between 3,800 Å and 5,800 Å, which features 10 windows of 200 Å. Middle Top Panel: Denormalised cecilia FT FCNN2 prediction. Bottom Top Panel: Slope correction with a linear least-squares fit. In each panel, we use dashed vertical lines to indicate the start/end of a 200 Å spectral window.
  • ...and 8 more figures