Benchmarking Dimensionality Reduction Methods for High-Dimensional ALMA Image Cubes
Haley N. Scolati, Ryan A. Loomis, Anthony J. Remijan, Kin Long Kelvin Lee
TL;DR
The paper addresses the challenge of analyzing high-dimensional ALMA image cubes by benchmarking dimensionality reduction methods. It evaluates three linear methods—PCA, sparse PCA, and NMF—plus a neural network autoencoder, focusing on compression efficiency and preservation of astrophysically relevant features. Using public ALMA data across multiple source morphologies, it compares reconstruction accuracy, computational cost, and scalability, and tests generalizability to additional morphologies and data properties. The findings guide data-product generation and archival analysis as ALMA approaches the Wideband Sensitivity Upgrade, highlighting trade-offs between accuracy, efficiency, and scalability.
Abstract
High-dimensional astronomical data cubes provide a wealth of spectral and structural information that can be used to study astrophysical and chemical processes. The complexity and sheer size of these datasets pose significant challenges in their efficient analysis, visualization, and interpretation. In specific astronomical use cases, a number of dimensionality reduction techniques, including traditional linear (e.g. principal component analysis) and modern nonlinear methods (e.g. convolutional autoencoders) have been used to tackle this high-dimensional problem. In this study, we assess the strengths, weaknesses, and nuances of various methods in their ability to capture and preserve astronomically-relevant features at lower dimensions. We provide recommendations to guide users in identifying and incorporating these treatments to their data, and provide insights into the computational scalability of these methods for observatory level data processing. This benchmark study uses publicly available archival ALMA data from a diverse sampling of source morphologies and observing setups to assess the performance and trade-offs between computational cost, image reconstruction accuracy, and scalability. Finally, we discuss the generalizability of these techniques in regard to data segmentation and labeling algorithms and how they can be exploited for advanced data product generation and streamlined archival analysis as we prepare to enter the era of the ALMA Wideband Sensitivity Upgrade.
