Table of Contents
Fetching ...

Distinguishing a planetary transit from false positives: a Transformer-based classification for planetary transit signals

Helem Salinas, Karim Pichara, Rafael Brahm, Francisco Pérez-Galarce, Domingo Mery

TL;DR

This paper tackles the challenge of distinguishing true exoplanet transits from false positives in large TESS light-curve datasets. It introduces a Transformer-based classifier with three encoders for local and global flux views plus stellar/transit parameters, leveraging self-attention and attention maps for interpretability. The model achieves competitive performance relative to CNN-based methods and demonstrates that incorporating centroid information improves both accuracy and interpretability. The work highlights the practical potential of attention-based models for efficient, interpretable screening of exoplanet candidates in large-scale surveys, with future work aimed at handling longer light curves and end-to-end classification.

Abstract

Current space-based missions, such as the Transiting Exoplanet Survey Satellite (TESS), provide a large database of light curves that must be analysed efficiently and systematically. In recent years, deep learning (DL) methods, particularly convolutional neural networks (CNN), have been used to classify transit signals of candidate exoplanets automatically. However, CNNs have some drawbacks; for example, they require many layers to capture dependencies on sequential data, such as light curves, making the network so large that it eventually becomes impractical. The self-attention mechanism is a DL technique that attempts to mimic the action of selectively focusing on some relevant things while ignoring others. Models, such as the Transformer architecture, were recently proposed for sequential data with successful results. Based on these successful models, we present a new architecture for the automatic classification of transit signals. Our proposed architecture is designed to capture the most significant features of a transit signal and stellar parameters through the self-attention mechanism. In addition to model prediction, we take advantage of attention map inspection, obtaining a more interpretable DL approach. Thus, we can identify the relevance of each element to differentiate a transit signal from false positives, simplifying the manual examination of candidates. We show that our architecture achieves competitive results concerning the CNNs applied for recognizing exoplanetary transit signals in data from the TESS telescope. Based on these results, we demonstrate that applying this state-of-the-art DL model to light curves can be a powerful technique for transit signal detection while offering a level of interpretability.

Distinguishing a planetary transit from false positives: a Transformer-based classification for planetary transit signals

TL;DR

This paper tackles the challenge of distinguishing true exoplanet transits from false positives in large TESS light-curve datasets. It introduces a Transformer-based classifier with three encoders for local and global flux views plus stellar/transit parameters, leveraging self-attention and attention maps for interpretability. The model achieves competitive performance relative to CNN-based methods and demonstrates that incorporating centroid information improves both accuracy and interpretability. The work highlights the practical potential of attention-based models for efficient, interpretable screening of exoplanet candidates in large-scale surveys, with future work aimed at handling longer light curves and end-to-end classification.

Abstract

Current space-based missions, such as the Transiting Exoplanet Survey Satellite (TESS), provide a large database of light curves that must be analysed efficiently and systematically. In recent years, deep learning (DL) methods, particularly convolutional neural networks (CNN), have been used to classify transit signals of candidate exoplanets automatically. However, CNNs have some drawbacks; for example, they require many layers to capture dependencies on sequential data, such as light curves, making the network so large that it eventually becomes impractical. The self-attention mechanism is a DL technique that attempts to mimic the action of selectively focusing on some relevant things while ignoring others. Models, such as the Transformer architecture, were recently proposed for sequential data with successful results. Based on these successful models, we present a new architecture for the automatic classification of transit signals. Our proposed architecture is designed to capture the most significant features of a transit signal and stellar parameters through the self-attention mechanism. In addition to model prediction, we take advantage of attention map inspection, obtaining a more interpretable DL approach. Thus, we can identify the relevance of each element to differentiate a transit signal from false positives, simplifying the manual examination of candidates. We show that our architecture achieves competitive results concerning the CNNs applied for recognizing exoplanetary transit signals in data from the TESS telescope. Based on these results, we demonstrate that applying this state-of-the-art DL model to light curves can be a powerful technique for transit signal detection while offering a level of interpretability.
Paper Structure (37 sections, 9 equations, 9 figures, 2 tables)

This paper contains 37 sections, 9 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Description of multihead attention mechanism defined in vaswani2017attention. a) Multiplication of the input sequence $X$ by the weight matrix $\mathbf{W}^V$, $\mathbf{W}^K$ and $\mathbf{W}^Q$ to produce $\mathbf{V}$, $\mathbf{K}$ and $\mathbf{Q}$ respectively. b) Multihead attention, where values $\mathbf{V}$, keys $\mathbf{K}$, and queries $\mathbf{Q}$ are linearly projected $h$ times with different learned linear projections. Then, multihead attention component generates $h$ dimensional output values in parallel. Finally, these are concatenated and projected, resulting in the final values.
  • Figure 2: Scheme of the proposed architecture for the analysis of transit signals. The architecture consists of three encoders, where each encoder is designed for i) local view, ii) global view, and iii) stellar and transit parameters. The first encoder block corresponds to the representation of the local view. The input is the concatenated time series of flux time and centroid. A convolutional embedding layer follows, which transforms this input into a vector representation with a positional encoding. This vector is the input of the multihead attention block, which is comprised of $N$ identical layers and a feed-forward sub-layer. The output of the multihead attention block is passed as input to a max pooling layer. The second encoder is implemented for the global view. Then, both time series are concatenated and follow the same steps as the first encoder. The third encoder contains the stellar/transit parameter information and is followed by a linear layer to transform this information into a vector representation. This representation is the input of the multihead attention block. Finally, the output of each encoder is passed as input to a linear layer with softmax for class prediction.
  • Figure 3: Confusion matrix resulting from the application of our architecture on the evaluation dataset that contains confirmed planets CP and known planets KP described in Section \ref{['datasets']}. Planet corresponds to positive instances and no planet to negative instances.
  • Figure 4: The precision-recall curve for test sets. The figure shows better performance if we include the centroid information in the model training. Furthermore, we can see a slight difference between training the model with only the local view of the transit + centroid compared to training the model without the centroid but with both views.
  • Figure 5: Average entropy of attention distributions. The blue line indicates the average attention entropy calculated for each layer, where each layer has 8 attention heads. The points represent the average value of the attention entropy of each head. The first four heads (1-4) of each layer are the blue dots, and the red dots correspond to the last four heads (5-8).
  • ...and 4 more figures