Table of Contents
Fetching ...

Maven: A Multimodal Foundation Model for Supernova Science

Gemma Zhang, Thomas Helfer, Alexander T. Gagliano, Siddharth Mishra-Sharma, V. Ashley Villar

TL;DR

This work presents Maven, the first foundation model for supernova science, which reaches state-of-the-art performance on both classification and redshift estimation, despite the embeddings not being explicitly optimized for these tasks.

Abstract

A common setting in astronomy is the availability of a small number of high-quality observations, and larger amounts of either lower-quality observations or synthetic data from simplified models. Time-domain astrophysics is a canonical example of this imbalance, with the number of supernovae observed photometrically outpacing the number observed spectroscopically by multiple orders of magnitude. At the same time, no data-driven models exist to understand these photometric and spectroscopic observables in a common context. Contrastive learning objectives, which have grown in popularity for aligning distinct data modalities in a shared embedding space, provide a potential solution to extract information from these modalities. We present Maven, the first foundation model for supernova science. To construct Maven, we first pre-train our model to align photometry and spectroscopy from 0.5M synthetic supernovae using a constrastive objective. We then fine-tune the model on 4,702 observed supernovae from the Zwicky Transient Facility. Maven reaches state-of-the-art performance on both classification and redshift estimation, despite the embeddings not being explicitly optimized for these tasks. Through ablation studies, we show that pre-training with synthetic data improves overall performance. In the upcoming era of the Vera C. Rubin Observatory, Maven serves as a Rosetta Stone for leveraging large, unlabeled and multimodal time-domain datasets.

Maven: A Multimodal Foundation Model for Supernova Science

TL;DR

This work presents Maven, the first foundation model for supernova science, which reaches state-of-the-art performance on both classification and redshift estimation, despite the embeddings not being explicitly optimized for these tasks.

Abstract

A common setting in astronomy is the availability of a small number of high-quality observations, and larger amounts of either lower-quality observations or synthetic data from simplified models. Time-domain astrophysics is a canonical example of this imbalance, with the number of supernovae observed photometrically outpacing the number observed spectroscopically by multiple orders of magnitude. At the same time, no data-driven models exist to understand these photometric and spectroscopic observables in a common context. Contrastive learning objectives, which have grown in popularity for aligning distinct data modalities in a shared embedding space, provide a potential solution to extract information from these modalities. We present Maven, the first foundation model for supernova science. To construct Maven, we first pre-train our model to align photometry and spectroscopy from 0.5M synthetic supernovae using a constrastive objective. We then fine-tune the model on 4,702 observed supernovae from the Zwicky Transient Facility. Maven reaches state-of-the-art performance on both classification and redshift estimation, despite the embeddings not being explicitly optimized for these tasks. Through ablation studies, we show that pre-training with synthetic data improves overall performance. In the upcoming era of the Vera C. Rubin Observatory, Maven serves as a Rosetta Stone for leveraging large, unlabeled and multimodal time-domain datasets.
Paper Structure (10 sections, 2 equations, 3 figures)

This paper contains 10 sections, 2 equations, 3 figures.

Figures (3)

  • Figure 1: Illustration of the mutual information between modalities considered in this work. Supernovae are characterized by a spectral energy distribution (SED) varying in wavelength and time. Photons are collected through broad-band filters (ZTF-$g$ and ZTF-$r$ with transmission curves shown in the top left) at multiple epochs during explosion to construct multi-band light curves (bottom right). At a single epoch, supernova light can also dispersed with a spectrograph (SEDM in this work) to obtain a spectrum (top right). Both modalities offer a complementary, but limited, view of an SN's underlying SED.
  • Figure 2: Overview of our training workflows. We first pre-train on a large simulated data set using contrastive methods (using light curves and spectra). We follow up by training on the observational ZTF dataset and then use a simple model to translate these embedding to downstream tasks. Different colors indicate different first training steps and their subsequent arrows indicate subsequent training steps.
  • Figure 3: Overview over model architecture. We start by using a linear layer to increase the dimensionality of the features from one to the chosen embedding dimension. Next, we encode both time and wavelength using sinusoidal embeddings. For light curves specifically, we additionally include embeddings to represent the band information. These combined tokens are then processed through a series of transformer blocks. To reduce the output to a single token, we use a simple mean or self-attention mechanism. Finally, we project this token to match the desired length of the embedding space.