XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images
Elisabeta-Iulia Dima, Pablo Gómez, Sandor Kruk, Peter Kretschmar, Simon Rosen, Călin-Adrian Popa
TL;DR
The paper addresses artefact detection in astronomical images by introducing XAMI, a benchmark dataset derived from the XMM-Newton Optical Monitor with 1000 annotated single-channel images totaling 7021 artefact masks across multiple artefact types. It presents a hybrid CNN-ViT baseline that fuses YOLOv8 for bounding boxes with the Segment Anything Model for segmentation, trained using ground-truth masks and a distilled MobileSAM encoder, with three mask outputs and Kuhn-Munkres assignment for alignment. The results demonstrate a mean IoU around 0.66 on validation and practical inference times (~100 ms per image, with SAM contributing 70–80 ms), indicating the method’s viability for medium-to-large astronomical surveys. The work provides reproducible code and data to enable further development, benchmarking, and integration into existing processing pipelines for artefact-aware astronomy.
Abstract
Reflected or scattered light produce artefacts in astronomical observations that can negatively impact the scientific study. Hence, automated detection of these artefacts is highly beneficial, especially with the increasing amounts of data gathered. Machine learning methods are well-suited to this problem, but currently there is a lack of annotated data to train such approaches to detect artefacts in astronomical observations. In this work, we present a dataset of images from the XMM-Newton space telescope Optical Monitoring camera showing different types of artefacts. We hand-annotated a sample of 1000 images with artefacts which we use to train automated ML methods. We further demonstrate techniques tailored for accurate detection and masking of artefacts using instance segmentation. We adopt a hybrid approach, combining knowledge from both convolutional neural networks (CNNs) and transformer-based models and use their advantages in segmentation. The presented method and dataset will advance artefact detection in astronomical observations by providing a reproducible baseline. All code and data are made available (https://github.com/ESA-Datalabs/XAMI-model and https://github.com/ESA-Datalabs/XAMI-dataset).
