Table of Contents
Fetching ...

Cutana: A High-Performance Tool for Astronomical Image Cutout Generation at Petabyte Scale

Pablo Gómez, Laslo Erik Ruhberg, Kristin Anett Remmelgas, David O'Ryan

TL;DR

The paper tackles the bottleneck of generating millions of source-specific image cutouts from petabyte-scale surveys like Euclid Q1. It introduces Cutana, a batch, memory-aware tool that groups sources by tiles and uses vectorised NumPy operations for batch cutout extraction, with support for FITS and Zarr outputs and multiple normalization schemes. Results show substantial performance gains over the standard Astropy approach, achieving up to 2264 cutouts per second with four workers and maintaining bounded memory (peak ~2.49 GB per worker; ~1.8 GB steady), demonstrating effective scaling in compute-bound regimes. The work enables scalable cutout production for current and future surveys, with planned integration into ESA Datalabs for Euclid DR1 and an open-source release pending licensing.

Abstract

The Euclid Quick Data Release 1 (Q1) encompasses 30 million sources across 63.1 square degrees, marking the beginning of petabyte-scale data delivery through Data Release 1 (DR1) and subsequent releases. Systematic exploitation of such datasets requires extracting millions of source-specific cutouts, yet standard tools like Astropy's Cutout2D process sources individually, creating bottlenecks for large catalogues. We introduce Cutana, a memory-efficient software tool optimised for batch processing in both local and cloud-native environments. Cutana employs vectorised NumPy operations to extract cutout batches simultaneously from FITS tiles, implements automated memory-aware scheduling, and supports both Zarr and FITS output formats with multiple common normalisation schemes (asinh, log, zscale). Cutana outperforms Astropy in all tested Q1 subset scenarios achieving near linear scaling and processing thousands of cutouts per second. On just four worker threads, Cutana can process all of Q1 in under four hours. The tool includes an ipywidget interface for parameter configuration and real-time monitoring. Integration with ESA Datalabs is underway for the Euclid DR1 release, with open-source release pending ESA open-source licensing processes.

Cutana: A High-Performance Tool for Astronomical Image Cutout Generation at Petabyte Scale

TL;DR

The paper tackles the bottleneck of generating millions of source-specific image cutouts from petabyte-scale surveys like Euclid Q1. It introduces Cutana, a batch, memory-aware tool that groups sources by tiles and uses vectorised NumPy operations for batch cutout extraction, with support for FITS and Zarr outputs and multiple normalization schemes. Results show substantial performance gains over the standard Astropy approach, achieving up to 2264 cutouts per second with four workers and maintaining bounded memory (peak ~2.49 GB per worker; ~1.8 GB steady), demonstrating effective scaling in compute-bound regimes. The work enables scalable cutout production for current and future surveys, with planned integration into ESA Datalabs for Euclid DR1 and an open-source release pending licensing.

Abstract

The Euclid Quick Data Release 1 (Q1) encompasses 30 million sources across 63.1 square degrees, marking the beginning of petabyte-scale data delivery through Data Release 1 (DR1) and subsequent releases. Systematic exploitation of such datasets requires extracting millions of source-specific cutouts, yet standard tools like Astropy's Cutout2D process sources individually, creating bottlenecks for large catalogues. We introduce Cutana, a memory-efficient software tool optimised for batch processing in both local and cloud-native environments. Cutana employs vectorised NumPy operations to extract cutout batches simultaneously from FITS tiles, implements automated memory-aware scheduling, and supports both Zarr and FITS output formats with multiple common normalisation schemes (asinh, log, zscale). Cutana outperforms Astropy in all tested Q1 subset scenarios achieving near linear scaling and processing thousands of cutouts per second. On just four worker threads, Cutana can process all of Q1 in under four hours. The tool includes an ipywidget interface for parameter configuration and real-time monitoring. Integration with ESA Datalabs is underway for the Euclid DR1 release, with open-source release pending ESA open-source licensing processes.

Paper Structure

This paper contains 4 sections, 2 figures.

Figures (2)

  • Figure 1: Cutana's ipywidget-based interface within ESA Datalabs showing configuration panel (left) for source catalogue, output format, normalisation, and processing parameters, alongside real-time cutout preview grid (right) enabling validation of settings before large-scale processing on Euclid Q1 data.
  • Figure 2: Memory consumption comparison processing 200,000 sources across 8 tiles. Astropy (blue) exhibits monotonic growth, whilst Cutana (orange: 1 worker, green: 4 workers) maintains bounded memory through tile-wise recycling. The sawtooth pattern reflects controlled loading cycles preventing exhaustion whilst maintaining throughput.