Table of Contents
Fetching ...

AstroM$^3$: A self-supervised multimodal model for astronomy

Mariia Rizhko, Joshua S. Bloom

TL;DR

This work constructs an astronomical multimodal dataset and proposes AstroM$^3, a self-supervised pre-training approach that enables a model to learn from multiple modalities simultaneously, and is the first construction of an $n>2$ mode model in astronomy.

Abstract

While machine-learned models are now routinely employed to facilitate astronomical inquiry, model inputs tend to be limited to a primary data source (namely images or time series) and, in the more advanced approaches, some metadata. Yet with the growing use of wide-field, multiplexed observational resources, individual sources of interest often have a broad range of observational modes available. Here we construct an astronomical multimodal dataset and propose AstroM$^3$, a self-supervised pre-training approach that enables a model to learn from multiple modalities simultaneously. Specifically, we extend the CLIP (Contrastive Language-Image Pretraining) model to a trimodal setting, allowing the integration of time-series photometry data, spectra, and astrophysical metadata. In a fine-tuning supervised setting, our results demonstrate that CLIP pre-training improves classification performance for time-series photometry, where accuracy increases from 84.6% to 91.5%. Furthermore, CLIP boosts classification accuracy by up to 12.6% when the availability of labeled data is limited, showing the effectiveness of leveraging larger corpora of unlabeled data. In addition to fine-tuned classification, we can use the trained model in other downstream tasks that are not explicitly contemplated during the construction of the self-supervised model. In particular we show the efficacy of using the learned embeddings for misclassifications identification, similarity search, and anomaly detection. One surprising highlight is the "rediscovery" of Mira subtypes and two Rotational variable subclasses using manifold learning and dimension reduction algorithm. To our knowledge this is the first construction of an $n>2$ mode model in astronomy. Extensions to $n>3$ modes is naturally anticipated with this approach.

AstroM$^3$: A self-supervised multimodal model for astronomy

TL;DR

This work constructs an astronomical multimodal dataset and proposes AstroMn>2$ mode model in astronomy.

Abstract

While machine-learned models are now routinely employed to facilitate astronomical inquiry, model inputs tend to be limited to a primary data source (namely images or time series) and, in the more advanced approaches, some metadata. Yet with the growing use of wide-field, multiplexed observational resources, individual sources of interest often have a broad range of observational modes available. Here we construct an astronomical multimodal dataset and propose AstroM, a self-supervised pre-training approach that enables a model to learn from multiple modalities simultaneously. Specifically, we extend the CLIP (Contrastive Language-Image Pretraining) model to a trimodal setting, allowing the integration of time-series photometry data, spectra, and astrophysical metadata. In a fine-tuning supervised setting, our results demonstrate that CLIP pre-training improves classification performance for time-series photometry, where accuracy increases from 84.6% to 91.5%. Furthermore, CLIP boosts classification accuracy by up to 12.6% when the availability of labeled data is limited, showing the effectiveness of leveraging larger corpora of unlabeled data. In addition to fine-tuned classification, we can use the trained model in other downstream tasks that are not explicitly contemplated during the construction of the self-supervised model. In particular we show the efficacy of using the learned embeddings for misclassifications identification, similarity search, and anomaly detection. One surprising highlight is the "rediscovery" of Mira subtypes and two Rotational variable subclasses using manifold learning and dimension reduction algorithm. To our knowledge this is the first construction of an mode model in astronomy. Extensions to modes is naturally anticipated with this approach.

Paper Structure

This paper contains 16 sections, 5 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Overview of the multimodal CLIP framework adapted for astronomy, incorporating three data modalities: photometric time-series, spectra, and metadata. Each modality is processed by a dedicated encoder to create embeddings, which are then mapped into a shared embedding space through projection heads. Pairwise similarity matrices align the embeddings across modalities, and a symmetric cross-entropy loss, computed over these matrices, optimizes the model. The total loss, derived from all pairwise losses, guides the model’s trimodal learning.
  • Figure 2: UMAP visualizations of multimodal embeddings: (a) training set and (b) test set, showing class separability and alignment between sets. Each source in the training and test set are coloured by the class determined in jayasinghe2024var but these class labels are not used in the construction of the embeddings.
  • Figure 3: Examples of catalog misclassifications with photometry and spectrum for each object. Top to bottom: (1) Likely EW missclassified as HADS; (2) V* AC CMi, a known semi-detached binary misclassified as RR Lyrae; (3) Possible SR or Mira variable with period alignment issues; (4) Known Mira variable (V0439 Cas) misclassified as SR; (5) Likely EW binary 2023AA...674A..16M misclassified as RRC.
  • Figure 4: Examples of in-class outliers flagged by the model due to distinctive features, despite correct labels. (a) EA-type star, V1174 Ori, an X-ray bright pre-main sequence system 2022ApJ...941..125S. (b) EB-type star with unusual out-of-eclipse modulations, possibly due to rotation. (c) Semi-detached binary with emission lines. (d) Likely an EB misclassified as EA, with light curve patterns indicating rotation or pulsation.
  • Figure 5: Color-magnitude diagram for ROT variables, with two clusters identified through unsupervised learning as giants and dwarfs.
  • ...and 4 more figures