YOLO-CIANNA: Galaxy detection with deep learning in radio data: II. Winning the SKA SDC2 using a generalized 3D-YOLO network

D. Cornu; B. Semelin; P. Salomé; X. Lu; S. Aicardi; J. Freundlich; F. Mertens; A. Marchal; G. Sainton; F. Combes; C. Tasse

YOLO-CIANNA: Galaxy detection with deep learning in radio data: II. Winning the SKA SDC2 using a generalized 3D-YOLO network

D. Cornu, B. Semelin, P. Salomé, X. Lu, S. Aicardi, J. Freundlich, F. Mertens, A. Marchal, G. Sainton, F. Combes, C. Tasse

TL;DR

This work generalizes YOLO-CIANNA to 3D hyperspectral HI cubes, introducing a dedicated 3D CNN backbone and a regression-based 3D bounding-box framework guided by a DIoU loss for robust detection and characterization. The method is trained via a bootstrap strategy on the SKA-like SDC2 data, combining the LDEV truth catalog with dynamic augmentation and a refined selection function to improve detection completeness and purity. It achieves state-of-the-art performance on the MAIN cube, with a 9.5% boost over the top SDC2 score, a high detection purity of $92.3\%$, and a $45\%$ increase in confirmed sources, while processing ~1 TB in ~30 minutes on a single GPU. The results demonstrate the viability of 3D CNN detectors for large hyperspectral HI data and chart a path toward applying YOLO-CIANNA to SKA observations and precursors, including transfer learning and deployment considerations for future surveys.

Abstract

As the scientific exploitation of the Square Kilometre Array (SKA) approaches, there is a need for new advanced data analysis and visualization tools capable of processing large high-dimensional datasets. In this study, we aim to generalize the YOLO-CIANNA deep learning source detection and characterization method for 3D hyperspectral HI emission cubes. We present the adaptations we made to the regression-based detection formalism and the construction of an end-to-end 3D convolutional neural network (CNN) backbone. We then describe a processing pipeline for applying the method to simulated 3D HI cubes from the SKA Observatory Science Data Challenge 2 (SDC2) dataset. The YOLO-CIANNA method was originally developed and used by the MINERVA team that won the official SDC2 competition. Despite the public release of the full SDC2 dataset, no published result has yet surpassed MINERVA's top score. In this paper, we present an updated version of our method that improves our challenge score by 9.5%. The resulting catalog exhibits a high detection purity of 92.3%, best-in-class characterization accuracy, and contains 45% more confirmed sources than concurrent classical detection tools. The method is also computationally efficient, processing the full ~1TB SDC2 data cube in 30 min on a single GPU. These state-of-the-art results highlight the effectiveness of 3D CNN-based detectors for processing large hyperspectral data cubes and represent a promising step toward applying YOLO-CIANNA to observational data from SKA and its precursors.

YOLO-CIANNA: Galaxy detection with deep learning in radio data: II. Winning the SKA SDC2 using a generalized 3D-YOLO network

TL;DR

Abstract

YOLO-CIANNA: Galaxy detection with deep learning in radio data: II. Winning the SKA SDC2 using a generalized 3D-YOLO network

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)