Multi-modal Foundation Model for Cosmological Simulation Data
Bin Xia, Nesar Ramachandra, Azton I. Wells, Salman Habib, John Wise
TL;DR
Bridging cosmological simulations and observations is addressed by MOSAIC, an encoder-only Transformer trained on $185{,}247$ training samples and $20{,}583$ test samples from the Last Journey simulation, learning a unified representation across scalar ($z$, $M_{ m halo}$, $M_$) and vector (photometry, SFH, SED) modalities. The model uses masked regression with a dynamic masking scheme to enable cross-modal translation and missing-data imputation. It achieves $50\%$ and $63\%$ improvements in redshift and stellar-mass inferences when combining complementary modalities, and latent-space analyses reveal astrophysically meaningful clustering and correlations. The work lays groundwork for extending to higher-dimensional data and probabilistic decoding, enabling tighter integration of simulations and observations for future cosmological inference.
Abstract
We present a multi-modal foundation model for astrophysical galaxy data, designed to map between simulation- and observation-based galactic features. Our encoder-only transformer flexibly ingests scalar quantities (e.g., redshifts, galaxy masses) and vectors (e.g., star formation histories, spectra), supporting multi-task training that includes within-modality reconstruction and cross-modality prediction. With a dynamic masking strategy, the model can query arbitrary galaxy properties from partial inputs -- including predicting spectra from redshift and mass, or estimating photometric redshifts from broadband magnitudes -- while also recovering missing segments within a modality. Trained on 185,000 simulated galaxies from a gigaparsec-scale Cosmology simulation, the model yields a 50% improvement in redshift estimation when combining LSST and SPHEREx photometry over LSST photometry alone, and a 63% improvement in stellar mass inference when combining late-time SFH with LSST photometry over early-time SFH with LSST photometry. The model demonstrates strong generalization across multi-modal tasks and lays the groundwork for future integration of higher-dimensional and structured data such as images, merger trees, and 3D fields. This approach provides a unified framework for connecting simulations and observations, advancing the development of generalizable astrophysical foundation models.
