Table of Contents
Fetching ...

A Poisson Process AutoDecoder for X-ray Sources

Yanke Song, Victoria Ashley Villar, Juan Rafael Martinez-Galarza, Steven Dillmann

TL;DR

PPAD addresses the challenge of reconstructing X-ray light curves from Poisson photon arrivals by proposing a Poisson-process-aware, unsupervised framework. It models photon arrivals with an inhomogeneous Poisson process, represents light curves as continuous neural fields, and learns fixed-length source embeddings via an encoder-less autodecoder, augmented with positional encoding and total-variation regularization. The method yields high-resolution rate reconstructions and informative latent representations that support downstream tasks such as hardness/variability regression, source-type classification, and anomaly detection on Chandra data. This approach enables scalable, label-free analysis of vast X-ray time-domain datasets while respecting the Poisson nature of the data and providing resolution-flexible modeling across energy bands.

Abstract

X-ray observing facilities, such as the Chandra X-ray Observatory and the eROSITA, have detected millions of astronomical sources associated with high-energy phenomena. The arrival of photons as a function of time follows a Poisson process and can vary by orders-of-magnitude, presenting obstacles for common tasks such as source classification, physical property derivation, and anomaly detection. Previous work has either failed to directly capture the Poisson nature of the data or only focuses on Poisson rate function reconstruction. In this work, we present Poisson Process AutoDecoder (PPAD). PPAD is a neural field decoder that maps fixed-length latent features to continuous Poisson rate functions across energy band and time via unsupervised learning. PPAD reconstructs the rate function and yields a representation at the same time. We demonstrate the efficacy of PPAD via reconstruction, regression, classification and anomaly detection experiments using the Chandra Source Catalog.

A Poisson Process AutoDecoder for X-ray Sources

TL;DR

PPAD addresses the challenge of reconstructing X-ray light curves from Poisson photon arrivals by proposing a Poisson-process-aware, unsupervised framework. It models photon arrivals with an inhomogeneous Poisson process, represents light curves as continuous neural fields, and learns fixed-length source embeddings via an encoder-less autodecoder, augmented with positional encoding and total-variation regularization. The method yields high-resolution rate reconstructions and informative latent representations that support downstream tasks such as hardness/variability regression, source-type classification, and anomaly detection on Chandra data. This approach enables scalable, label-free analysis of vast X-ray time-domain datasets while respecting the Poisson nature of the data and providing resolution-flexible modeling across energy bands.

Abstract

X-ray observing facilities, such as the Chandra X-ray Observatory and the eROSITA, have detected millions of astronomical sources associated with high-energy phenomena. The arrival of photons as a function of time follows a Poisson process and can vary by orders-of-magnitude, presenting obstacles for common tasks such as source classification, physical property derivation, and anomaly detection. Previous work has either failed to directly capture the Poisson nature of the data or only focuses on Poisson rate function reconstruction. In this work, we present Poisson Process AutoDecoder (PPAD). PPAD is a neural field decoder that maps fixed-length latent features to continuous Poisson rate functions across energy band and time via unsupervised learning. PPAD reconstructs the rate function and yields a representation at the same time. We demonstrate the efficacy of PPAD via reconstruction, regression, classification and anomaly detection experiments using the Chandra Source Catalog.

Paper Structure

This paper contains 21 sections, 8 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Compared to an autoencoder where the latent vectors are produced by the encoder, an autodecoder directly accepts latent vectors as inputs. A randomly initialized latent vector is assigned to each data point (event file) in the beginning of training, and latent vectors are optimized together with the decoder weights through gradient descent. At inference time on a new data point, decoder weights are frozen, and a new latent vector is optimized via gradient descent.
  • Figure 2: Illustration of PPAD. Latent vectors are concatenated to positionally encoded time $t$ and fed to the shared ResNet together. The network outputs the value $r(t)$ of the rate function at time $t$, which, together with values at other times, yield the reconstructed rate function $r$. The rate function $r$ is then used to compute the loss function in \ref{['eqn:loss_final']} against the event files. When trained with multiple event files, all event files share the same ResNet weights but each has a different corresponding latent vector. Gradients are back-propagated to both the ResNet and the latents.
  • Figure 3: Binned event files vs light curves reconstructed by PPAD. Rate from top row to bottom row: total, soft, medium, hard. Event files are binned every $5$ minutes (an arbitrary choice), and reconstructed light curve rates are normalized correspondingly (counts per 5 minutes). Binned event files result in noisy variations. Reconstructed light curves, on the other hand, smooth out the inherit stochasticity of event files while still picking up conspicuous trends.
  • Figure 4: Top $2$ principal components of latent features and corresponding hardness ratios. It shows strong relations between the learned representations and meaningful physical features.
  • Figure 5: Targeted anomaly (upper left) and $15$ neighboring sources which are closest in the latent space. Almost all found sources are low-count hard-band flares, as the targeted anomaly source does.