Table of Contents
Fetching ...

Leveraging pre-trained vision Transformers for multi-band photometric light curve classification

Daniel Moreno-Cartagena, Pavlos Protopapas, Guillermo Cabrera-Vives, Martina Cádiz-Leyton, Ignacio Becker, Cristóbal Donoso-Oliva

TL;DR

This work investigates whether a pre-trained vision Transformer (SwinV2) can classify multi-band photometric light curves without explicit feature extraction. Light curves are converted into RGB images using Grid and Overlay schemes and then fine-tuned on MACHO and ELAsTiCC datasets. The SwinV2 model demonstrates competitive performance, often outperforming light-curve–specific models, with notable gains from multi-band inputs and strong results on large, six-band datasets. The study suggests a scalable, generalizable framework for time-domain astronomy that could be extended with metadata and multi-modal learning for future surveys like LSST.

Abstract

This study investigates the potential of a pre-trained vision Transformer (VT) model, specifically the Swin Transformer V2 (SwinV2), to classify photometric light curves without the need for feature extraction or multi-band preprocessing. The goal is to assess whether this image-based approach can accurately differentiate astronomical phenomena and serve as a viable option for working with multi-band photometric light curves. We transformed each multi-band light curve into an image. These images serve as input to the SwinV2 model, which is pre-trained on ImageNet-21K. The datasets employed include the public Catalog of Variable Stars from the Massive Compact Halo Object (MACHO) survey, using both one and two bands, and the first round of the recent Extended LSST Astronomical Time-Series Classification Challenge (ELAsTiCC), which includes six bands. The performance of the model was evaluated on six classes for the MACHO dataset and 20 distinct classes of variable stars and transient events for the ELAsTiCC dataset. The fine-tuned SwinV2 achieved better performance than models specifically designed for light curves, such as Astromer and the Astronomical Transformer for Time Series and Tabular Data (ATAT). When trained on the full MACHO dataset, it attained a macro F1-score of 80.2 and outperformed Astromer in single-band experiments. Incorporating a second band further improved performance, increasing the F1-score to 84.1. In the ELAsTiCC dataset, SwinV2 achieved a macro F1-score of 65.5, slightly surpassing ATAT by 1.3.

Leveraging pre-trained vision Transformers for multi-band photometric light curve classification

TL;DR

This work investigates whether a pre-trained vision Transformer (SwinV2) can classify multi-band photometric light curves without explicit feature extraction. Light curves are converted into RGB images using Grid and Overlay schemes and then fine-tuned on MACHO and ELAsTiCC datasets. The SwinV2 model demonstrates competitive performance, often outperforming light-curve–specific models, with notable gains from multi-band inputs and strong results on large, six-band datasets. The study suggests a scalable, generalizable framework for time-domain astronomy that could be extended with metadata and multi-modal learning for future surveys like LSST.

Abstract

This study investigates the potential of a pre-trained vision Transformer (VT) model, specifically the Swin Transformer V2 (SwinV2), to classify photometric light curves without the need for feature extraction or multi-band preprocessing. The goal is to assess whether this image-based approach can accurately differentiate astronomical phenomena and serve as a viable option for working with multi-band photometric light curves. We transformed each multi-band light curve into an image. These images serve as input to the SwinV2 model, which is pre-trained on ImageNet-21K. The datasets employed include the public Catalog of Variable Stars from the Massive Compact Halo Object (MACHO) survey, using both one and two bands, and the first round of the recent Extended LSST Astronomical Time-Series Classification Challenge (ELAsTiCC), which includes six bands. The performance of the model was evaluated on six classes for the MACHO dataset and 20 distinct classes of variable stars and transient events for the ELAsTiCC dataset. The fine-tuned SwinV2 achieved better performance than models specifically designed for light curves, such as Astromer and the Astronomical Transformer for Time Series and Tabular Data (ATAT). When trained on the full MACHO dataset, it attained a macro F1-score of 80.2 and outperformed Astromer in single-band experiments. Incorporating a second band further improved performance, increasing the F1-score to 84.1. In the ELAsTiCC dataset, SwinV2 achieved a macro F1-score of 65.5, slightly surpassing ATAT by 1.3.

Paper Structure

This paper contains 15 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Comparison of visualization strategies. Panel (a) illustrates the "Grid" approach, while panel (b) depicts the "Overlay" approach. Each image was generated using the best hyperparameters identified for its respective method. The single-band and two-band example is a Long-Period Variable (LPV) (ID: 24.3466.13) from the MACHO dataset, whereas the six-band example is a Pair-Instability Supernova (PISN) (ID: 41820707) from the ELAsTiCC dataset.
  • Figure 2: SwinV2 architecture. The rounded rectangle in light blue highlights the components where the model changes the dimensions of the information, while the regular rectangles indicate where they remain fixed.
  • Figure 3: Distribution of periodic variable star classes in the training and validation sets for the MACHO dataset.
  • Figure 4: Distribution of transients, stochastic variables, and periodic variable star classes in the training and validation sets for the first round of the ELAsTiCC dataset.
  • Figure 5: SwinV2 classification performance on the MACHO test set using single- and multi-band data trained on varying sample sizes. The figure presents the best results from the hyperparameter search for both the "Overlay" and "Grid" approaches.
  • ...and 6 more figures