Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models

Jia-Shu Pan; Yuan-Sen Ting; Jie Yu

Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models

Jia-Shu Pan, Yuan-Sen Ting, Jie Yu

TL;DR

Astroconformer introduces a Transformer-based framework that fuses self-attention with convolution to analyze stellar light curves in the time domain, capturing long-range correlations and phase information that are overlooked by power spectra. The model uses patch embeddings and Rotary Positional Encoding to encode 90-day Kepler light curves into a sequence processed by an 8-head MHSA encoder with convolutional modules, ultimately predicting $\log g$ from the full time series. It achieves state-of-the-art performance, with RMSE as low as $0.017$ dex near $\log g\approx3$ and robust $\nu_{\max}$ estimates (relative median error $<2\%$) on short segments, outperforming both k-NN and CNN baselines and competing with traditional asteroseismic pipelines on limited data. Attention maps provide interpretability, revealing sensitivity to both oscillations and granulation, indicating that Astroconformer leverages non-Gaussian phase information and long-timescale stellar signals. The work demonstrates the potential of Transformer-based architectures for scalable, high-precision asteroseismology in upcoming surveys with varying cadences and observation windows.

Abstract

Stellar light curves contain valuable information about oscillations and granulation, offering insights into stars' internal structures and evolutionary states. Traditional asteroseismic techniques, primarily focused on power spectral analysis, often overlook the crucial phase information in these light curves. Addressing this gap, recent machine learning applications, particularly those using Convolutional Neural Networks (CNNs), have made strides in inferring stellar properties from light curves. However, CNNs are limited by their localized feature extraction capabilities. In response, we introduce $\textit{Astroconformer}$, a Transformer-based deep learning framework, specifically designed to capture long-range dependencies in stellar light curves. Our empirical analysis centers on estimating surface gravity ($\log g$), using a dataset derived from single-quarter Kepler light curves with $\log g$ values ranging from 0.2 to 4.4. $\textit{Astroconformer}$ demonstrates superior performance, achieving a root-mean-square-error (RMSE) of 0.017 dex at $\log g\approx3$ in data-rich regimes and up to 0.1 dex in sparser areas. This performance surpasses both K-nearest neighbor models and advanced CNNs. Ablation studies highlight the influence of receptive field size on model effectiveness, with larger fields correlating to improved results. $\textit{Astroconformer}$ also excels in extracting $ν_{\max}$ with high precision. It achieves less than 2% relative median absolute error for 90-day red giant light curves. Notably, the error remains under 3% for 30-day light curves, whose oscillations are undetectable by a conventional pipeline in 30% cases. Furthermore, the attention mechanisms in $\textit{Astroconformer}$ align closely with the characteristics of stellar oscillations and granulation observed in light curves.

Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models

TL;DR

from the full time series. It achieves state-of-the-art performance, with RMSE as low as

dex near

and robust

estimates (relative median error

) on short segments, outperforming both k-NN and CNN baselines and competing with traditional asteroseismic pipelines on limited data. Attention maps provide interpretability, revealing sensitivity to both oscillations and granulation, indicating that Astroconformer leverages non-Gaussian phase information and long-timescale stellar signals. The work demonstrates the potential of Transformer-based architectures for scalable, high-precision asteroseismology in upcoming surveys with varying cadences and observation windows.

Abstract

, a Transformer-based deep learning framework, specifically designed to capture long-range dependencies in stellar light curves. Our empirical analysis centers on estimating surface gravity (

), using a dataset derived from single-quarter Kepler light curves with

values ranging from 0.2 to 4.4.

demonstrates superior performance, achieving a root-mean-square-error (RMSE) of 0.017 dex at

in data-rich regimes and up to 0.1 dex in sparser areas. This performance surpasses both K-nearest neighbor models and advanced CNNs. Ablation studies highlight the influence of receptive field size on model effectiveness, with larger fields correlating to improved results.

also excels in extracting

with high precision. It achieves less than 2% relative median absolute error for 90-day red giant light curves. Notably, the error remains under 3% for 30-day light curves, whose oscillations are undetectable by a conventional pipeline in 30% cases. Furthermore, the attention mechanisms in

align closely with the characteristics of stellar oscillations and granulation observed in light curves.

Paper Structure (23 sections, 5 equations, 10 figures, 1 table)

This paper contains 23 sections, 5 equations, 10 figures, 1 table.

Introduction
Relevant Studies and Motivation
Astroconformer: A Transformer-Based Method to Analyze Stellar Light Curves
Self-attention mechanism
Astroconformer Architecture
Embedding
Astroconformer Encoder
Pooling and Prediction Layer
Data
Sample Selection
Light Curve Pre-Processing
Results
Compared with k-NN Based Methods
Transformer Models versus Convolutional Neural Networks
Comparison with Asteroseismic Pipelines
...and 8 more sections

Figures (10)

Figure 1: The figure contrasts the limited receptive field of CNNs with the global receptive field afforded by self-attention mechanisms, especially in the context of long sequences like 4-year Kepler light curves. (a) In CNNs, multiple convolutional layers with small receptive fields are stacked to achieve a broader receptive field in deeper layers. The heatmap illustrates the pairwise correlations between segments, showing that each segment primarily incorporates information only from its immediate neighbors. (b) Self-attention, on the other hand, calculates the correlations between all segments of the light curve and incorporates this global information into its output.
Figure 2: A schematic illustration of the self-attention mechanism within Transformer models and its capability for extracting long-range information. The top panel depicts a basic single-head self-attention mechanism. In this setup, the input is duplicated into three copies, each subjected to distinct linear transformations defined by the learnable matrices $\boldsymbol{W}^Q$, $\boldsymbol{W}^K$, and $\boldsymbol{W}^V$. Two of these copies, $\boldsymbol{Q}$ and $\boldsymbol{K}$, are used for an inner dot product to compute similarities between different timestamps. These computed attention values are subsequently merged with the remaining $\boldsymbol{V}$ copy to produce a new representation of the input sequence. The bottom panel showcases an extension to multi-head self-attention (MHSA). While retaining the same number of learnable parameters, MHSA partitions the linear transformations into separate blocks. The cross-matching between these blocks enables the capture of diverse correlations within sequences, accommodating different types of relevancy, such as varying time scales.
Figure 3: Architecture of Astroconformer. The icon labeled $E$ represents the MLP employed in the model. Stellar light curves are partitioned into patches of size 20 (corresponding to a time span of 10 hours), each of which is then transformed into a vector via a fully connected layer denoted by $E$. These vectors are processed through the Astroconformer encoder to extract local and global features. The Astroconformer architecture consists of multiple such blocks, each incorporating a learnable MHSA module followed by two convolutional modules. To facilitate training, skip connections and subsequent layer normalization are applied between every pair of adjacent modules within each Astroconformer block. Vectors output by the Astroconformer encoder are aggregated through average pooling to produce the final representation of the entire light curve. This representation vector is subsequently fed into a final MLP for predicting the $\log g$ of stars.
Figure 4: Distribution of surface gravity versus effective temperature for our two data sets. The data set from yu18 consists of 16,094 giant stars with $\log g$ values ranging from 1.5 to 3.3. Based on yu18, the data set from swan further includes turnoff stars and extends to the tip of the red giant branch (TRGB), while a data curation detailed in text is executed, leading to 14,003 stars in the data set.
Figure 5: A comparative study of Astroconformer and the k-nearest neighbor-based method employed in the swan. Astroconformer demonstrates enhanced generalization and yields fewer outlier inferences, particularly in the upper giant branch. It also consistently outperforms the swan in terms of RMSE across the full $\log g$ range. The top and middle panels show the inferred $\log g$ values obtained from Astroconformer and the swan, respectively, plotted against the asteroseismic $\log g$. The bottom panel illustrates the running mean in RMSE (with a window size of 0.4 dex) for both algorithms.
...and 5 more figures

Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models

TL;DR

Abstract

Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)