Towards model-free stellar chemical abundances. Potential applications in the search for chemically peculiar stars in large spectroscopic surveys
Theosamuele Signor, Paula Jofré, Hernan Lira, Sara Vitali, Luis Martí, Nayat Sánchez-Pi
TL;DR
This work tackles the challenge of extracting stellar chemical abundances from spectra without relying on extensive labeled catalogs or imperfect atmosphere models. It introduces a self-supervised, disentangled representation learning framework based on a variational autoencoder with element-specific decoders, enforcing that latent features correspond to $[Fe/H]$, $[C/Fe]$, and $[\alpha/Fe]$ while disentangling non-chemical factors like $T_{\rm eff}$ and $\log g$. The model demonstrates high-quality reconstruction on synthetic low-resolution spectra and shows that latent axes strongly correlate with the intended abundances ($r$ values of $0.92$, $0.92$, and $0.82$), enabling robust flagging of chemically enhanced or depleted stars (e.g., $\alpha$PMP and $\text{CEMP}$) with high precision. It also introduces selective gradient flow to prevent cross-talk between latent factors, and relies on a Gaussian prior to facilitate outlier detection in a principled way. The approach holds promise for scalable chemical tagging in large spectroscopic surveys and offers a complementary path to model-based abundance inference, with demonstrated potential on real data and clear avenues for extension to additional elements and higher-resolution datasets.
Abstract
Chemical abundance determinations from stellar spectra are challenged by observational noise, limitations in stellar models, and departures from simplifying assumptions. While traditional and supervised machine learning methods have made remarkable progress in estimating atmospheric parameters and chemical compositions within existing physical models, these factors still constrain our ability to fully exploit the vast data sets provided by modern spectroscopic surveys. We aim to develop a self-supervised, disentangled representation learning framework that extracts chemically meaningful features directly from spectra, without relying on externally imposed label catalogs. We build a variational autoencoder-based representation learning model with physics-inspired structure: multiple decoders each focus on spectral regions dominated by a particular element, enforcing that each latent dimension maps to a single abundance. To evaluate the potential application of our framework, we trained and validated the model on low-resolution, low signal-to-noise synthetic spectra focusing on $\rm [Fe/H]$, $\rm [C/Fe]$, and $\rm [α/Fe]$. We then demonstrate how the trained model can be used to flag stars as chemically enhanced or depleted in these abundances based on their position within the latent distribution. Our model successfully learns a representation of spectra whose axes correlate tightly with the target abundances ($r=0.92\pm0.01$ for $\rm [Fe/H]$, $r=0.92\pm0.01$ for $\rm [C/Fe]$, $r=0.82\pm0.02$ for $\rm [α/Fe]$). The disentangled representations provide a robust means to distinguish stars based on their chemical properties, offering an efficient and scalable solution for large spectroscopic surveys.
