Table of Contents
Fetching ...

Deep Investigation of Neutral Gas Origins (DINGO): Options for the Processing and Storage of Radio Astronomy Data for robust Deep Spectral Line Imaging in the SKA-Era using uv-Grids

Alexander Williamson, Richard Dodson, Pascal J. Elahi, Jonghwan Rhee, Qian Gong, Martin Meyer, Kristof Rozgonyi, Andreas Wicenec, Jieyang Chen, Norbert Podhorszki, Scott Klasky, Daniel Mitchell

TL;DR

Deep Investigation of Neutral Gas Origins (DINGO) addresses the data-volume and I/O challenges of SKA-era deep HI spectral-line imaging by implementing a grid-based uv-grid storage approach and integrating MGARD compression via ADIOS2 within ASKAPSoft. The authors demonstrate substantial data-volume reductions (lossless ~7x; lossy up to ~20x) and significant processing-time improvements through parallel I/O, while assessing the impact on image fidelity to guide safe SKA deployment. They show that lossless MGARD preserves imaging results, whereas aggressive lossy bounds can bias deconvolution through PSF inaccuracies, informing trade-offs for SKA data management. The study provides a practical, scalable blueprint for handling SKA-scale data, guiding storage, transmission, and compute strategies for robust deep spectral-line imaging.

Abstract

The next generation of radio astronomy telescopes are challenging existing data analysis paradigms, as they have an order of magnitude more antennas and larger bandwidth. Foremost amongst these are deep spectral line surveys, because these have the largest number of epochs and spectral channels per dataset. For example, the Deep Investigation of Neutral Gas Origins (DINGO) project on the Australian Square Kilometre Array Pathfinder (ASKAP) aims to observe over 3,000 hours spread over hundreds of observing sessions, covering two pointings and two frequency settings. The two primary problems encountered when processing this data are the need for storage and that processing is primarily I/O limited. To address these issues, we have implemented an deep imaging pipeline based on the storage of an intermediate data product in the software ASKAPSoft, that of the uv-gridded data, and have demonstrated lossy and lossless compression of this data on ASKAP, using MGARD and ADIOS2 libraries. We find data compression ratios from a factor of 7 (lossless) up to 20 (using lossy compression with an absolute error bound of $10^{-4}$), and processing is faster by a factor of 7 for lossless compression. We discuss the effectiveness of lossy MGARD compression and its adherence to the designated error bounds, the trade-off between these error bounds and the corresponding compression ratios, as well as the potential consequences of these I/O and storage improvements on the science quality of the data products.

Deep Investigation of Neutral Gas Origins (DINGO): Options for the Processing and Storage of Radio Astronomy Data for robust Deep Spectral Line Imaging in the SKA-Era using uv-Grids

TL;DR

Deep Investigation of Neutral Gas Origins (DINGO) addresses the data-volume and I/O challenges of SKA-era deep HI spectral-line imaging by implementing a grid-based uv-grid storage approach and integrating MGARD compression via ADIOS2 within ASKAPSoft. The authors demonstrate substantial data-volume reductions (lossless ~7x; lossy up to ~20x) and significant processing-time improvements through parallel I/O, while assessing the impact on image fidelity to guide safe SKA deployment. They show that lossless MGARD preserves imaging results, whereas aggressive lossy bounds can bias deconvolution through PSF inaccuracies, informing trade-offs for SKA data management. The study provides a practical, scalable blueprint for handling SKA-scale data, guiding storage, transmission, and compute strategies for robust deep spectral-line imaging.

Abstract

The next generation of radio astronomy telescopes are challenging existing data analysis paradigms, as they have an order of magnitude more antennas and larger bandwidth. Foremost amongst these are deep spectral line surveys, because these have the largest number of epochs and spectral channels per dataset. For example, the Deep Investigation of Neutral Gas Origins (DINGO) project on the Australian Square Kilometre Array Pathfinder (ASKAP) aims to observe over 3,000 hours spread over hundreds of observing sessions, covering two pointings and two frequency settings. The two primary problems encountered when processing this data are the need for storage and that processing is primarily I/O limited. To address these issues, we have implemented an deep imaging pipeline based on the storage of an intermediate data product in the software ASKAPSoft, that of the uv-gridded data, and have demonstrated lossy and lossless compression of this data on ASKAP, using MGARD and ADIOS2 libraries. We find data compression ratios from a factor of 7 (lossless) up to 20 (using lossy compression with an absolute error bound of ), and processing is faster by a factor of 7 for lossless compression. We discuss the effectiveness of lossy MGARD compression and its adherence to the designated error bounds, the trade-off between these error bounds and the corresponding compression ratios, as well as the potential consequences of these I/O and storage improvements on the science quality of the data products.
Paper Structure (24 sections, 9 figures)

This paper contains 24 sections, 9 figures.

Figures (9)

  • Figure 1: The graphical representation in EAGLE of the DINGO gridding pipeline. Specific parameters are provided, split among the scattered instances, merged with the global template and the individual ASKAPSoft imager applications are run to generate the schedule block residual visibility grids and the model components subtracted. Following this, in another similar graph, the gridded visibilities are combined, imaged and the model components are added back to produce the final stacked image.
  • Figure 2: The compression ratio for increasing specified error bounds. Shown are the visibility and PSF grid compression ratios for lossless compression, lossy compression with absolute error bounds, and that with relative error bounds.
  • Figure 3: The absolute residuals between the images produced after lossy compression and the uncompressed equivalent. The colour scales are not normalised as the range of values for each image is significantly different. Ignoring the image corners, the smallest error bounds on the right-hand side lead to maximum residuals of $\sim10^{-5}$.
  • Figure 4: The unnormalised 2-point correlation of the residuals of the images produced with varying error bounds. The x-axis represents the spatial frequency bins of the image, where $\ell$=0 is equivalent to structures over the whole field of view and $\ell$=700 is equivalent to structures of 16, half the image resolution. The colours are as for Figure \ref{['fig:imgdiff']} and same first two error bounds show power at angular scales greater than the resolution, indicating reconstruction errors.
  • Figure 5: (Left) The spectral profile (relative to the RMS of the cube) of a galaxy source within the field of interest. (Right) the residuals of the profile when comparing the original cube and the cubes that have undergone MGARD compression.
  • ...and 4 more figures