TDLight: A Framework for Incremental Light Curve Management and Smart Classification
Xinghang Yu, Ce Yu, Zeguang Shao, Bin Yang
TL;DR
The paper addresses the data management bottlenecks of time-domain astronomy, where expanding light-curve volumes and the need for timely analysis clash with offline, batch-processing pipelines. It presents TDLight, a unified framework that repurposes the industrial IoT database TDengine with a one-table-per-source storage model and HEALPix indexing, integrated with the LEAVES classifier for incremental, trigger-based classification. Key contributions include high ingestion throughput (up to 954,000 rows s^-1 archival and 541,000 rows s^-1 streaming), fast cone-search performance (~50–100 ms), and validated early classification accuracy (>85% at 50% data) plus a mechanism to flag high-value candidates. The work provides a Dockerized deployment and web interface to enable practical adoption, accelerating follow-up for time-critical events and informing the design of next-generation time-domain pipelines.
Abstract
With the exponential growth of time-domain surveys, the volume of light curves has increased rapidly. However, many survey projects, such as Gaia, still rely on offline batch-processing workflows in which data are calibrated, merged, and released only after an observing phase is completed. This latency delays scientific analysis and causes many high-value transient events to be buried in archival data, missing the window for timely follow-up. While existing alert brokers handle heterogeneous data streams, it remains difficult to deploy a unified framework that combines high-performance incremental storage with real-time classification on local infrastructure. To address this challenge, we propose TDLight, a scalable system that adapts the time-series database TDengine (a high-performance IoT database) for astronomical data using a one-table-per-source schema. This architecture supports high-throughput ingestion, achieving 954,000 rows s^-1 for archived data and 541,000 rows s^-1 for incremental streams, while Hierarchical Equal Area isoLatitude Pixelization (HEALPix) indexing enables efficient cone-search queries. Building on this storage layer, we integrate the pre-trained hierarchical Random Forest classifier from the LEAVES framework to construct an incremental classification pipeline. Using the LEAVES dataset, we simulate data accumulation and evaluate a trigger-based strategy that performs early classification at specific observational milestones. In addition, by monitoring the evolution of classification probabilities, the system identifies "high-value candidates" -- sources that show high early confidence but later undergo significant label shifts. TDLight is released as an open-source Dockerized environment, providing a deployable infrastructure for next-generation time-domain surveys.
