Table of Contents
Fetching ...

Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification

Steven Dillmann, Juan Rafael Martínez-Galarza

TL;DR

This work proposes novel two- and three-dimensional tensor representations for event time series, coupled with sparse autoencoders that learn physically meaningful latent representations that support a variety of downstream tasks, including anomaly detection, similarity-based retrieval, semantic clustering, and unsupervised classification.

Abstract

Event time series are sequences of discrete events occurring at irregular time intervals, each associated with a domain-specific observational modality. They are common in domains such as high-energy astrophysics, computational social science, cybersecurity, finance, healthcare, neuroscience, and seismology. Their unstructured and irregular structure poses significant challenges for extracting meaningful patterns and identifying salient phenomena using conventional techniques. We propose novel two- and three-dimensional tensor representations for event time series, coupled with sparse autoencoders that learn physically meaningful latent representations. These embeddings support a variety of downstream tasks, including anomaly detection, similarity-based retrieval, semantic clustering, and unsupervised classification. We demonstrate our approach on a real-world dataset from X-ray astronomy, showing that these representations successfully capture temporal and spectral signatures and isolate diverse classes of X-ray transients. Our framework offers a flexible, scalable, and generalizable solution for analyzing complex, irregular event time series across scientific and industrial domains.

Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification

TL;DR

This work proposes novel two- and three-dimensional tensor representations for event time series, coupled with sparse autoencoders that learn physically meaningful latent representations that support a variety of downstream tasks, including anomaly detection, similarity-based retrieval, semantic clustering, and unsupervised classification.

Abstract

Event time series are sequences of discrete events occurring at irregular time intervals, each associated with a domain-specific observational modality. They are common in domains such as high-energy astrophysics, computational social science, cybersecurity, finance, healthcare, neuroscience, and seismology. Their unstructured and irregular structure poses significant challenges for extracting meaningful patterns and identifying salient phenomena using conventional techniques. We propose novel two- and three-dimensional tensor representations for event time series, coupled with sparse autoencoders that learn physically meaningful latent representations. These embeddings support a variety of downstream tasks, including anomaly detection, similarity-based retrieval, semantic clustering, and unsupervised classification. We demonstrate our approach on a real-world dataset from X-ray astronomy, showing that these representations successfully capture temporal and spectral signatures and isolate diverse classes of X-ray transients. Our framework offers a flexible, scalable, and generalizable solution for analyzing complex, irregular event time series across scientific and industrial domains.

Paper Structure

This paper contains 14 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Two-dimensional t-SNE projection of the learned latent space from the SAE applied on the $E$–$t$-$dt$ cubes. Panel A: Points are color-coded by the variability index of the corresponding X-ray sources. Panel B: Points are color-coded by the hard-to-soft X-ray hardness ratio. Panel C: Known dips, flares, and pulsating sources (crosses) occupy distinct clusters in the embedding space, enabling the identification of new transient candidates via clustering and similarity searches.
  • Figure 2: Nearest-neighbor retrieval results in the learned latent space for three representative transient types: dips from a low-mass X-ray binary (top row, 300 s bins), a flare from a young stellar object (middle row, 400 s bins), and pulsations from a pulsar (bottom row, 100 s bins). For each target light curve (column 1), we show its three nearest neighbors (columns 2–4). The retrieved neighbors correspond to physically similar phenomena: dips from low-mass X-ray binaries, flares from young stars or variable stars, and pulsations from pulsars.
  • Figure 3: The top row shows 300 s bin light curves from event files featuring a dip (blue), a flare (red), and pulsations (green), respectively. The middle row shows the corresponding $E$–$t$ maps, and the bottom row shows the corresponding $E$–$t$–$dt$ cubes for these event files.
  • Figure 4: Two-dimensional t-SNE projection of the learned latent space from the SAE applied on the $E$–$t$ maps. Panel A: Points are color-coded by the variability index of the corresponding X-ray sources. Panel B: Points are color-coded by the hard-to-soft X-ray hardness ratio. Panel C: Known dips, flares, and pulsating sources (crosses) distributed across the embedding space.