Astromer 2

Cristobal Donoso-Oliva; Ignacio Becker; Pavlos Protopapas; Guillermo Cabrera-Vives; Martina Cádiz-Leyton; Daniel Moreno-Cartagena

Astromer 2

Cristobal Donoso-Oliva, Ignacio Becker, Pavlos Protopapas, Guillermo Cabrera-Vives, Martina Cádiz-Leyton, Daniel Moreno-Cartagena

TL;DR

Astromer 2 presents a self-supervised, BERT-inspired transformer for light-curve embeddings, pretrained on 1.5 million MACHO single-band light curves. It replaces masked magnitudes with a trainable MASK token, deepens the encoder to six blocks, and employs an uncertainty-weighted RMSE loss, yielding substantial improvements over Astromer 1 in low-data regimes and strong cross-dataset generalization to ATLAS. The results show enhanced downstream classification performance, robust clustering in embedding space, and efficient finetuning, highlighting the approach as scalable for large astronomical time-series analysis. The work also discusses ethical and practical considerations, including emissions and open-source release of weights and code to enable broader adoption.

Abstract

Foundational models have emerged as a powerful paradigm in deep learning field, leveraging their capacity to learn robust representations from large-scale datasets and effectively to diverse downstream applications such as classification. In this paper, we present Astromer 2 a foundational model specifically designed for extracting light curve embeddings. We introduce Astromer 2 as an enhanced iteration of our self-supervised model for light curve analysis. This paper highlights the advantages of its pre-trained embeddings, compares its performance with that of its predecessor, Astromer 1, and provides a detailed empirical analysis of its capabilities, offering deeper insights into the model's representations. Astromer 2 is pretrained on 1.5 million single-band light curves from the MACHO survey using a self-supervised learning task that predicts randomly masked observations within sequences. Fine-tuning on a smaller labeled dataset allows us to assess its performance in classification tasks. The quality of the embeddings is measured by the F1 score of an MLP classifier trained on Astromer-generated embeddings. Our results demonstrate that Astromer 2 significantly outperforms Astromer 1 across all evaluated scenarios, including limited datasets of 20, 100, and 500 samples per class. The use of weighted per-sample embeddings, which integrate intermediate representations from Astromer's attention blocks, is particularly impactful. Notably, Astromer 2 achieves a 15% improvement in F1 score on the ATLAS dataset compared to prior models, showcasing robust generalization to new datasets. This enhanced performance, especially with minimal labeled data, underscores the potential of Astromer 2 for more efficient and scalable light curve analysis.

Astromer 2

TL;DR

Abstract

Astromer 2

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (24)