Deep and Sparse Denoising Benchmarks for Spectral Data Cubes of High-z Galaxies: From Simulations to ALMA observations
Arnab Lahiry, Tanio Díaz-Santos, Jean-Luc Starck, Niranjan Chandra Roy, Daniel Anglés-Alcázar, Grigorios Tsagkatakis, Panagiotis Tsakalides
TL;DR
This work benchmarks four denoising strategies—PCA, ICA, iterative 2D-1D wavelet soft thresholding, and a supervised 3D U-Net—for spectral cubes of high-redshift galaxies across synthetic toy datasets, FIRE mock IFU cubes, and real ALMA observations (CRISTAL and W2246). It demonstrates that while unsupervised wavelet methods robustly reduce noise and preserve flux in moderate-SNR data, the 3D U-Net excels in RMSE reduction and morphology preservation on synthetic and CRISTAL-like data, albeit with hallucination risks at very low SNR and limited performance on morphologies not present in the training distribution. The study emphasizes the value of a synthetic training framework for transfer learning to real observations, while highlighting the need for physically informed priors and uncertainty quantification to ensure robustness across diverse morphologies, including diffuse and merger-driven emission. Overall, the proposed framework enables transferable, high-fidelity denoising for ALMA and future IFU datasets from facilities like JWST and VLT/MUSE, enhancing flux conservation and kinematic recovery in faint, high-z galaxies.
Abstract
Beyond cosmic noon, galaxies appear as faint whispers amid noise, yet this epoch is key to understanding massive galaxy assembly. ALMA's sensitivity to cold dust and [C II] emission allows us to probe their interstellar medium, but faint signals make robust denoising essential. We evaluate and benchmark denoising strategies including Principal Component Analysis, Independent Component Analysis, sparse unsupervised representations: iterative soft thresholding with 2D-1D wavelets, and supervised deep learning with a 3D U-Net, to identify techniques that suppress noise while preserving flux and morphology across peak SNRs of 2.5-8, applied to (i) synthetic spectral cubes of rotating toy disk galaxies, (ii) synthetic [C II] IFU cubes from FIRE simulations, and (iii) ALMA [C II] observations of CRISTAL galaxies and W2246-0526. Performance is assessed via RMSE, conservation of flux and spectra, noise reduction, and SNR improvement of the central galaxy. For synthetic cubes: PCA and ICA provide marginal improvement; IST reduces noise effectively at moderate SNRs but can suppress emission at low SNRs; and the U-Net outperforms IST, though it can produce quantifiable hallucinations at lower-SNRs. For moderate-SNR observations (ALMA-CRISTAL), U-Net and IST achieve comparable performance, conserving >91% flux and increasing SNR by >6. However, for observations with complex morphologies absent in the training set (W2246), the U-Net underperforms relative to IST, recovering ~80% flux, while IST robustly conserves flux and improves SNR by ~3, highlighting generalisation challenges and the need for physically-motivated training priors. We conclude that IST is a robust unsupervised denoiser for moderate-SNR data, and a synthetically trained U-Net generalises effectively to real data, dependent on training priors. This framework offers a pathway for transferable denoising for ALMA, VLT/MUSE, and JWST.
