Table of Contents
Fetching ...

MALLORN: Many Artificial LSST Lightcurves based on Observations of Real Nuclear transients

Dylan Magill, Matt Nicholl, Vysakh Anilkumar, Sjoert van Velzen, Xinyue Sheng, Thai Son Mai, Hung Viet Tran, Ngoc Phu Doan, Thomas Moore, Shubham Srivastav, David R. Young, Charlotte R. Angus, Joshua Weston

TL;DR

MALLORN presents a data-driven pipeline to synthesize LSST-like lightcurves from real ZTF nuclear transients, enabling photometric classification of rare events such as tidal disruption events (TDEs) in the era of LSST. The workflow combines Gaussian Process interpolation, LSST-depth luminosity rescaling, SNCosmo-based color corrections across six bands, and Rubin cadence embedding to produce a large, labeled, six-band time-domain dataset suitable for classifier challenges. The authors release a Kaggle challenge with 10,178 lightcurves (ground-truth and test sets) to benchmark photometric TDE classifiers, and they analyze how LSST cadence and band choices affect TDE detectability using their simulated population. This work provides a replicable framework for generating survey-specific transient simulations and offers practical insights for early classifier development ahead of LSST operations.

Abstract

The Vera C. Rubin Observatory's 10-Year Legacy Survey of Space and Time (LSST) is expected to produce a hundredfold increase in the number of transients we observe. However, there are insufficient spectroscopic resources to follow up on all of the wealth of targets that LSST will provide. As such it is necessary to be able to prioritise objects for followup observations or inclusion in sample studies based purely on their LSST photometry. We are particularly keen to identify tidal disruption events (TDEs) with LSST. TDEs are immensely useful for determining black hole parameters and probing our understanding of accretion physics. To assist in these efforts, we present the Many Artificial LSST Lightcurves based on the Observations of Real Nuclear transients (MALLORN) data set and the corresponding classifier challenge for identifying TDEs. MALLORN comprises 10178 simulated LSST light curves, constructed from real Zwicky Transient Facility (ZTF) observations of 64 TDEs, 727 nuclear supernovae and 1407 AGN with spectroscopic labels using Gaussian process fitting, empirically-motivated spectral energy distributions from SNCosmo and the baseline from the Rubin Survey Simulator. Our novel approach can be easily adapted to simulate transients for any photometric survey using observations from another, requiring only the limiting magnitudes and an estimate of the cadence of observations. The MALLORN Astronomical Classification Challenge, launched on Kaggle on 15/10/2025, will allow competitors to test their photometric classifiers on simulated LSST data to find TDEs and improve upon their capabilities prior to the start of LSST.

MALLORN: Many Artificial LSST Lightcurves based on Observations of Real Nuclear transients

TL;DR

MALLORN presents a data-driven pipeline to synthesize LSST-like lightcurves from real ZTF nuclear transients, enabling photometric classification of rare events such as tidal disruption events (TDEs) in the era of LSST. The workflow combines Gaussian Process interpolation, LSST-depth luminosity rescaling, SNCosmo-based color corrections across six bands, and Rubin cadence embedding to produce a large, labeled, six-band time-domain dataset suitable for classifier challenges. The authors release a Kaggle challenge with 10,178 lightcurves (ground-truth and test sets) to benchmark photometric TDE classifiers, and they analyze how LSST cadence and band choices affect TDE detectability using their simulated population. This work provides a replicable framework for generating survey-specific transient simulations and offers practical insights for early classifier development ahead of LSST operations.

Abstract

The Vera C. Rubin Observatory's 10-Year Legacy Survey of Space and Time (LSST) is expected to produce a hundredfold increase in the number of transients we observe. However, there are insufficient spectroscopic resources to follow up on all of the wealth of targets that LSST will provide. As such it is necessary to be able to prioritise objects for followup observations or inclusion in sample studies based purely on their LSST photometry. We are particularly keen to identify tidal disruption events (TDEs) with LSST. TDEs are immensely useful for determining black hole parameters and probing our understanding of accretion physics. To assist in these efforts, we present the Many Artificial LSST Lightcurves based on the Observations of Real Nuclear transients (MALLORN) data set and the corresponding classifier challenge for identifying TDEs. MALLORN comprises 10178 simulated LSST light curves, constructed from real Zwicky Transient Facility (ZTF) observations of 64 TDEs, 727 nuclear supernovae and 1407 AGN with spectroscopic labels using Gaussian process fitting, empirically-motivated spectral energy distributions from SNCosmo and the baseline from the Rubin Survey Simulator. Our novel approach can be easily adapted to simulate transients for any photometric survey using observations from another, requiring only the limiting magnitudes and an estimate of the cadence of observations. The MALLORN Astronomical Classification Challenge, launched on Kaggle on 15/10/2025, will allow competitors to test their photometric classifiers on simulated LSST data to find TDEs and improve upon their capabilities prior to the start of LSST.

Paper Structure

This paper contains 13 sections, 8 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Plot of flux ($\mu$Jy) against time (relative to peak) showing the r-band observations for SN II ZTF22abcfics/SN2022ryv (red) and the GP fit to those observations (orange). The GP fit reliably follows the trend within the data points. During large gaps in observations the uncertainty in the GP fit is inflated. We show the GP uncertainty only for visualisation and use a different method to determine the uncertainty in our simulated LSST observations.
  • Figure 2: Plot showing the Damped Random Walk (DRW) fit to the observed AGN ZTF lightcurve (ZTF20acxfcmr). The DRW more effectively handles the stochasticity of an AGN lightcurve. During large gaps in observations the uncertainty in the GP fit is inflated. We show the GP uncertainty only for visualisation and use a different method to determine the uncertainty in our simulated LSST observations.
  • Figure 3: Six band plot of continuous k-corrected lightcurves for the SN II ZTF22abcfics/SN2022ryv. The $r$-band is produced via a GP fit of the observed data, whilst the other bands are produced using the colour differences generated from the SNCosmo models. The instances of more erratic variation in the $z$ and $y$ bands are the result of less data for the model for those bands, but are well within the bounds of the error values for their respective bands.
  • Figure 4: Plot comparing the simulated continuous $g$ band lightcurve generated via the observed redshift SNCosmo colour differences to the observed g-band data for the SN II ZTF22abcfics/SN2022ryv. The simulated data is shown to provide a reasonable match to the observed data, verifying the validity of this approach.
  • Figure 5: Example of a final simulated LSST lightcurve based on the real ZTF lightcurve for SN II ZTF22abcfics/SN2022ryv. The luminosities have been subjected to some random scatter within measurement uncertainties to mimic the expected scatter from noise in LSST.
  • ...and 4 more figures