Table of Contents
Fetching ...

LenghuSky-8: An 8-Year All-Sky Cloud Dataset with Star-Aware Masks and Alt-Az Calibration for Segmentation and Nowcasting

Yicheng Rui, Xiao-Wei Duan, Licai Deng, Fan Yang, Zhengming Dang, Zhengjun Du, Junhao Peng, Wenhao Chu, Umut Mahmut, Kexin Li, Yiyun Wu, Fabo Feng

Abstract

Ground-based time-domain observatories require minute-by-minute, site-scale awareness of cloud cover, yet existing all-sky datasets are short, daylight-biased, or lack astrometric calibration. We present LenghuSky-8, an eight-year (2018-2025) all-sky imaging dataset from a premier astronomical site, comprising 429,620 $512 \times 512$ frames with 81.2% night-time coverage, star-aware cloud masks, background masks, and per-pixel altitude-azimuth (Alt-Az) calibration. For robust cloud segmentation across day, night, and lunar phases, we train a linear probe on DINOv3 local features and obtain 93.3% $\pm$ 1.1% overall accuracy on a balanced, manually labeled set of 1,111 images. Using stellar astrometry, we map each pixel to local alt-az coordinates and measure calibration uncertainties of approximately 0.37 deg at zenith and approximately 1.34 deg at 30 deg altitude, sufficient for integration with telescope schedulers. Beyond segmentation, we introduce a short-horizon nowcasting benchmark over per-pixel three-class logits (sky/cloud/contamination) with four baselines: persistence (copying the last frame), optical flow, ConvLSTM, and VideoGPT. ConvLSTM performs best but yields only limited gains over persistence, underscoring the difficulty of near-term cloud evolution. We release the dataset, calibrations, and an open-source toolkit for loading, evaluation, and scheduler-ready alt-az maps to boost research in segmentation, nowcasting, and autonomous observatory operations.

LenghuSky-8: An 8-Year All-Sky Cloud Dataset with Star-Aware Masks and Alt-Az Calibration for Segmentation and Nowcasting

Abstract

Ground-based time-domain observatories require minute-by-minute, site-scale awareness of cloud cover, yet existing all-sky datasets are short, daylight-biased, or lack astrometric calibration. We present LenghuSky-8, an eight-year (2018-2025) all-sky imaging dataset from a premier astronomical site, comprising 429,620 frames with 81.2% night-time coverage, star-aware cloud masks, background masks, and per-pixel altitude-azimuth (Alt-Az) calibration. For robust cloud segmentation across day, night, and lunar phases, we train a linear probe on DINOv3 local features and obtain 93.3% 1.1% overall accuracy on a balanced, manually labeled set of 1,111 images. Using stellar astrometry, we map each pixel to local alt-az coordinates and measure calibration uncertainties of approximately 0.37 deg at zenith and approximately 1.34 deg at 30 deg altitude, sufficient for integration with telescope schedulers. Beyond segmentation, we introduce a short-horizon nowcasting benchmark over per-pixel three-class logits (sky/cloud/contamination) with four baselines: persistence (copying the last frame), optical flow, ConvLSTM, and VideoGPT. ConvLSTM performs best but yields only limited gains over persistence, underscoring the difficulty of near-term cloud evolution. We release the dataset, calibrations, and an open-source toolkit for loading, evaluation, and scheduler-ready alt-az maps to boost research in segmentation, nowcasting, and autonomous observatory operations.
Paper Structure (26 sections, 3 equations, 8 figures, 14 tables)

This paper contains 26 sections, 3 equations, 8 figures, 14 tables.

Figures (8)

  • Figure 1: Samples from the dataset. First column represents the raw images that is augmented for cloud segmentation; Second column is the annotation of clear sky (blue), cloudy region (orange) and contamination region (pink); Third column is the mask for background; Forth column is the overlay of the first three columns; Last two columns are the astrometric calibration results for altitude and azimuth of the image.
  • Figure 2: Workflow of this paper. Solid arrows denote dependencies among product data; dashed arrows denote potential dependencies not considered in our experiments.
  • Figure 3: Daily number of captured frames in the dataset.
  • Figure 4: Failure cases of the all-sky camera. Top row (left to right): (1) Covered by dust or sand; (2) Covered by dew or ice; (3) Scattered light caused by mud coverage; (4) Covered by snow. Bottom row (left to right): (5) Obstruction by an external object; (6) Strong nearby light source; (7) Strong distant light source; (8) Camera malfunction.
  • Figure 5: Manual annotation for cloud segmentation. Blue represents cloud regions; orange represents sky regions; Pink represents contamination regions; Columns represents images taken around given time in UTC+8; Rows represents images taken in different moon phase condition. Nearby frames are used by human to determine whether some regions are scatter light or cloud.
  • ...and 3 more figures