Machine-learning inference of stellar properties using integrated photometric and spectroscopic data
Ilay Kamai, Alex M. Bronstein, Hagai B. Perets
TL;DR
DESA introduces a multimodal foundation model that unifies photometric light curves and spectra into a physically informative stellar latent space by first training modality-specific encoders with a hybrid SSL/supervised objective, then aligning them via the DualFormer module. The DualFormer combines self- and cross-attention and employs a dual, projection-based alignment with a covariance regularizer, yielding an eigenspace that captures shared structure across modalities. Empirically, DESA achieves state-of-the-art performance on binary detection ($AUC = 0.99$, $AP = 1.00$) and stellar age prediction ($RMSE = 0.94$ Gyr), while zero-/few-shot evaluations recover CMD and HR diagrams with $R^2 = 0.92$ and enable meaningful population discovery (e.g., separating synchronized binaries from young stars in latent space). The work demonstrates that integrating heterogeneous surveys through a carefully designed multimodal architecture enables both improved predictive accuracy and new astrophysical insights, paving the way for population-level analyses and discovery in large stellar surveys.
Abstract
Stellar astrophysics relies on diverse observational modalities-primarily photometric light curves and spectroscopic data from which fundamental stellar properties are inferred. While machine learning (ML) has advanced analysis within individual modalities, the complementary information encoded across modalities remains largely underexploited. We present DESA (Dual Embedding model for Stellar Astrophysics), a novel multi-modal foundation model that integrates light curves and spectra to learn a unified, physically meaningful latent space for stars. DESA first trains separate modality-specific encoders using a hybrid supervised/self-supervised scheme, and then aligns them through DualFormer, a Transformer-based cross-modal integration module tailored for astrophysical data. DualFormer combines cross- and self-attention, a novel dual-projection alignment loss, and a projection-space eigendecomposition that yields physically structured embeddings. We demonstrate that DESA significantly outperforms leading unimodal and self-supervised baselines across a range of tasks. In zero- and few-shot settings, DESA's learned representations recover stellar color-magnitude and Hertzsprung-Russell diagrams with high fidelity ($R^2 = 0.92$ for photometric regressions). In full fine-tuning, DESA achieves state-of-the-art accuracy for binary star detection (AUC = $0.99$, AP = $1.00$) and stellar age prediction (RMSE = $0.94$ Gyr). As a compelling case, DESA naturally separates synchronized binaries from young stars, two populations with nearly identical light curves, purely from their embedded positions in UMAP space, without requiring external kinematic or luminosity information. DESA thus offers a powerful new framework for multimodal, data-driven stellar population analysis, enabling both accurate prediction and novel discovery.
