Table of Contents
Fetching ...

XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images

Elisabeta-Iulia Dima, Pablo Gómez, Sandor Kruk, Peter Kretschmar, Simon Rosen, Călin-Adrian Popa

TL;DR

The paper addresses artefact detection in astronomical images by introducing XAMI, a benchmark dataset derived from the XMM-Newton Optical Monitor with 1000 annotated single-channel images totaling 7021 artefact masks across multiple artefact types. It presents a hybrid CNN-ViT baseline that fuses YOLOv8 for bounding boxes with the Segment Anything Model for segmentation, trained using ground-truth masks and a distilled MobileSAM encoder, with three mask outputs and Kuhn-Munkres assignment for alignment. The results demonstrate a mean IoU around 0.66 on validation and practical inference times (~100 ms per image, with SAM contributing 70–80 ms), indicating the method’s viability for medium-to-large astronomical surveys. The work provides reproducible code and data to enable further development, benchmarking, and integration into existing processing pipelines for artefact-aware astronomy.

Abstract

Reflected or scattered light produce artefacts in astronomical observations that can negatively impact the scientific study. Hence, automated detection of these artefacts is highly beneficial, especially with the increasing amounts of data gathered. Machine learning methods are well-suited to this problem, but currently there is a lack of annotated data to train such approaches to detect artefacts in astronomical observations. In this work, we present a dataset of images from the XMM-Newton space telescope Optical Monitoring camera showing different types of artefacts. We hand-annotated a sample of 1000 images with artefacts which we use to train automated ML methods. We further demonstrate techniques tailored for accurate detection and masking of artefacts using instance segmentation. We adopt a hybrid approach, combining knowledge from both convolutional neural networks (CNNs) and transformer-based models and use their advantages in segmentation. The presented method and dataset will advance artefact detection in astronomical observations by providing a reproducible baseline. All code and data are made available (https://github.com/ESA-Datalabs/XAMI-model and https://github.com/ESA-Datalabs/XAMI-dataset).

XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images

TL;DR

The paper addresses artefact detection in astronomical images by introducing XAMI, a benchmark dataset derived from the XMM-Newton Optical Monitor with 1000 annotated single-channel images totaling 7021 artefact masks across multiple artefact types. It presents a hybrid CNN-ViT baseline that fuses YOLOv8 for bounding boxes with the Segment Anything Model for segmentation, trained using ground-truth masks and a distilled MobileSAM encoder, with three mask outputs and Kuhn-Munkres assignment for alignment. The results demonstrate a mean IoU around 0.66 on validation and practical inference times (~100 ms per image, with SAM contributing 70–80 ms), indicating the method’s viability for medium-to-large astronomical surveys. The work provides reproducible code and data to enable further development, benchmarking, and integration into existing processing pipelines for artefact-aware astronomy.

Abstract

Reflected or scattered light produce artefacts in astronomical observations that can negatively impact the scientific study. Hence, automated detection of these artefacts is highly beneficial, especially with the increasing amounts of data gathered. Machine learning methods are well-suited to this problem, but currently there is a lack of annotated data to train such approaches to detect artefacts in astronomical observations. In this work, we present a dataset of images from the XMM-Newton space telescope Optical Monitoring camera showing different types of artefacts. We hand-annotated a sample of 1000 images with artefacts which we use to train automated ML methods. We further demonstrate techniques tailored for accurate detection and masking of artefacts using instance segmentation. We adopt a hybrid approach, combining knowledge from both convolutional neural networks (CNNs) and transformer-based models and use their advantages in segmentation. The presented method and dataset will advance artefact detection in astronomical observations by providing a reproducible baseline. All code and data are made available (https://github.com/ESA-Datalabs/XAMI-model and https://github.com/ESA-Datalabs/XAMI-dataset).

Paper Structure

This paper contains 6 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Examples of artefacts in various space missions. (upper left) An optical ghost detected in Euclid’s First Light near-infrared images. (upper right)Ghost rays and stray light patterns present in NuSTAR mission. (bottom left) Star loops and dragon's breath artefacts appearing in the Hubble Space Telescope images. (bottom right) Star loops and streaks present in the XMM-Newton Optical Monitor.
  • Figure 2: Artefacts appearing in the XMM-OM observation S0148740701 of the QSO 1939+7000 field (U filter).
  • Figure 3: Distribution of annotation bounding boxes across different classes in the XAMI dataset.
  • Figure 4: (left) Cumulative distribution of IoUs between predicted and true masks on training and validation sets. (right) Comparison of IoU distributions with higher median and consistency in training data and greater variability in validation data.
  • Figure 5: Detected masks across eight fields within the validation set, with increasing mean IoU between predicted and ground-truth masks. The mean IoU on the validation set images is $0.658\pm0.207$.