Table of Contents
Fetching ...

Astromer 2

Cristobal Donoso-Oliva, Ignacio Becker, Pavlos Protopapas, Guillermo Cabrera-Vives, Martina Cádiz-Leyton, Daniel Moreno-Cartagena

TL;DR

Astromer 2 presents a self-supervised, BERT-inspired transformer for light-curve embeddings, pretrained on 1.5 million MACHO single-band light curves. It replaces masked magnitudes with a trainable MASK token, deepens the encoder to six blocks, and employs an uncertainty-weighted RMSE loss, yielding substantial improvements over Astromer 1 in low-data regimes and strong cross-dataset generalization to ATLAS. The results show enhanced downstream classification performance, robust clustering in embedding space, and efficient finetuning, highlighting the approach as scalable for large astronomical time-series analysis. The work also discusses ethical and practical considerations, including emissions and open-source release of weights and code to enable broader adoption.

Abstract

Foundational models have emerged as a powerful paradigm in deep learning field, leveraging their capacity to learn robust representations from large-scale datasets and effectively to diverse downstream applications such as classification. In this paper, we present Astromer 2 a foundational model specifically designed for extracting light curve embeddings. We introduce Astromer 2 as an enhanced iteration of our self-supervised model for light curve analysis. This paper highlights the advantages of its pre-trained embeddings, compares its performance with that of its predecessor, Astromer 1, and provides a detailed empirical analysis of its capabilities, offering deeper insights into the model's representations. Astromer 2 is pretrained on 1.5 million single-band light curves from the MACHO survey using a self-supervised learning task that predicts randomly masked observations within sequences. Fine-tuning on a smaller labeled dataset allows us to assess its performance in classification tasks. The quality of the embeddings is measured by the F1 score of an MLP classifier trained on Astromer-generated embeddings. Our results demonstrate that Astromer 2 significantly outperforms Astromer 1 across all evaluated scenarios, including limited datasets of 20, 100, and 500 samples per class. The use of weighted per-sample embeddings, which integrate intermediate representations from Astromer's attention blocks, is particularly impactful. Notably, Astromer 2 achieves a 15% improvement in F1 score on the ATLAS dataset compared to prior models, showcasing robust generalization to new datasets. This enhanced performance, especially with minimal labeled data, underscores the potential of Astromer 2 for more efficient and scalable light curve analysis.

Astromer 2

TL;DR

Astromer 2 presents a self-supervised, BERT-inspired transformer for light-curve embeddings, pretrained on 1.5 million MACHO single-band light curves. It replaces masked magnitudes with a trainable MASK token, deepens the encoder to six blocks, and employs an uncertainty-weighted RMSE loss, yielding substantial improvements over Astromer 1 in low-data regimes and strong cross-dataset generalization to ATLAS. The results show enhanced downstream classification performance, robust clustering in embedding space, and efficient finetuning, highlighting the approach as scalable for large astronomical time-series analysis. The work also discusses ethical and practical considerations, including emissions and open-source release of weights and code to enable broader adoption.

Abstract

Foundational models have emerged as a powerful paradigm in deep learning field, leveraging their capacity to learn robust representations from large-scale datasets and effectively to diverse downstream applications such as classification. In this paper, we present Astromer 2 a foundational model specifically designed for extracting light curve embeddings. We introduce Astromer 2 as an enhanced iteration of our self-supervised model for light curve analysis. This paper highlights the advantages of its pre-trained embeddings, compares its performance with that of its predecessor, Astromer 1, and provides a detailed empirical analysis of its capabilities, offering deeper insights into the model's representations. Astromer 2 is pretrained on 1.5 million single-band light curves from the MACHO survey using a self-supervised learning task that predicts randomly masked observations within sequences. Fine-tuning on a smaller labeled dataset allows us to assess its performance in classification tasks. The quality of the embeddings is measured by the F1 score of an MLP classifier trained on Astromer-generated embeddings. Our results demonstrate that Astromer 2 significantly outperforms Astromer 1 across all evaluated scenarios, including limited datasets of 20, 100, and 500 samples per class. The use of weighted per-sample embeddings, which integrate intermediate representations from Astromer's attention blocks, is particularly impactful. Notably, Astromer 2 achieves a 15% improvement in F1 score on the ATLAS dataset compared to prior models, showcasing robust generalization to new datasets. This enhanced performance, especially with minimal labeled data, underscores the potential of Astromer 2 for more efficient and scalable light curve analysis.

Paper Structure

This paper contains 26 sections, 5 equations, 24 figures, 2 tables.

Figures (24)

  • Figure 1: The self-supervised masking strategy used for pretraining. For each light curve, 50% of the observation points are selected as the 'probed' subset, which the model must predict. This subset consists of three components: 30% of the points are fully masked (hidden), 10% are replaced with random magnitudes, and 10% remain visible. This strategy forces the model to learn from context rather than simply memorizing positions.
  • Figure 2: Overview of the Astromer 1 architecture. An input embedding is formed by summing a positional encoding (PE) of the observation times and a linear projection of the magnitudes. This embedding is processed by an encoder composed of M=2 blocks, each containing H=4 self-attention heads, to produce the final light curve representation, which is derived from the output of the last block.
  • Figure 3: The Astromer 2 architecture, an enhanced version of the model shown in Fig. \ref{['fig:astromer_0']}. The primary architectural change is at the input stage: magnitudes designated for masking are now replaced by a single, trainable MASK token. Additionally, the encoder's depth is increased to M=6 blocks to improve its representational capacity.
  • Figure 4: Magnitude distributions of the pretraining data (MACHO) and labeled sets (Alcock, ATLAS). The Alcock and MACHO distributions are similar, though Alcock is bimodal. The ATLAS data, from a different survey, shows a distinct distribution with higher variance. During training, all magnitudes are normalized, which removes the mean shifts shown here.
  • Figure 5: Observation cadence ($\Delta t$, time between consecutive points) for the MACHO, Alcock, and ATLAS datasets. The boxplots show that MACHO and Alcock have similar, regular cadences (median $\sim3-4$ days). The ATLAS dataset is distinct, with a much faster median cadence and greater variability in observation times. The y-axis uses a logarithmic scale to display the wide range of values.
  • ...and 19 more figures