Table of Contents
Fetching ...

Deep and Sparse Denoising Benchmarks for Spectral Data Cubes of High-z Galaxies: From Simulations to ALMA observations

Arnab Lahiry, Tanio Díaz-Santos, Jean-Luc Starck, Niranjan Chandra Roy, Daniel Anglés-Alcázar, Grigorios Tsagkatakis, Panagiotis Tsakalides

TL;DR

This work benchmarks four denoising strategies—PCA, ICA, iterative 2D-1D wavelet soft thresholding, and a supervised 3D U-Net—for spectral cubes of high-redshift galaxies across synthetic toy datasets, FIRE mock IFU cubes, and real ALMA observations (CRISTAL and W2246). It demonstrates that while unsupervised wavelet methods robustly reduce noise and preserve flux in moderate-SNR data, the 3D U-Net excels in RMSE reduction and morphology preservation on synthetic and CRISTAL-like data, albeit with hallucination risks at very low SNR and limited performance on morphologies not present in the training distribution. The study emphasizes the value of a synthetic training framework for transfer learning to real observations, while highlighting the need for physically informed priors and uncertainty quantification to ensure robustness across diverse morphologies, including diffuse and merger-driven emission. Overall, the proposed framework enables transferable, high-fidelity denoising for ALMA and future IFU datasets from facilities like JWST and VLT/MUSE, enhancing flux conservation and kinematic recovery in faint, high-z galaxies.

Abstract

Beyond cosmic noon, galaxies appear as faint whispers amid noise, yet this epoch is key to understanding massive galaxy assembly. ALMA's sensitivity to cold dust and [C II] emission allows us to probe their interstellar medium, but faint signals make robust denoising essential. We evaluate and benchmark denoising strategies including Principal Component Analysis, Independent Component Analysis, sparse unsupervised representations: iterative soft thresholding with 2D-1D wavelets, and supervised deep learning with a 3D U-Net, to identify techniques that suppress noise while preserving flux and morphology across peak SNRs of 2.5-8, applied to (i) synthetic spectral cubes of rotating toy disk galaxies, (ii) synthetic [C II] IFU cubes from FIRE simulations, and (iii) ALMA [C II] observations of CRISTAL galaxies and W2246-0526. Performance is assessed via RMSE, conservation of flux and spectra, noise reduction, and SNR improvement of the central galaxy. For synthetic cubes: PCA and ICA provide marginal improvement; IST reduces noise effectively at moderate SNRs but can suppress emission at low SNRs; and the U-Net outperforms IST, though it can produce quantifiable hallucinations at lower-SNRs. For moderate-SNR observations (ALMA-CRISTAL), U-Net and IST achieve comparable performance, conserving >91% flux and increasing SNR by >6. However, for observations with complex morphologies absent in the training set (W2246), the U-Net underperforms relative to IST, recovering ~80% flux, while IST robustly conserves flux and improves SNR by ~3, highlighting generalisation challenges and the need for physically-motivated training priors. We conclude that IST is a robust unsupervised denoiser for moderate-SNR data, and a synthetically trained U-Net generalises effectively to real data, dependent on training priors. This framework offers a pathway for transferable denoising for ALMA, VLT/MUSE, and JWST.

Deep and Sparse Denoising Benchmarks for Spectral Data Cubes of High-z Galaxies: From Simulations to ALMA observations

TL;DR

This work benchmarks four denoising strategies—PCA, ICA, iterative 2D-1D wavelet soft thresholding, and a supervised 3D U-Net—for spectral cubes of high-redshift galaxies across synthetic toy datasets, FIRE mock IFU cubes, and real ALMA observations (CRISTAL and W2246). It demonstrates that while unsupervised wavelet methods robustly reduce noise and preserve flux in moderate-SNR data, the 3D U-Net excels in RMSE reduction and morphology preservation on synthetic and CRISTAL-like data, albeit with hallucination risks at very low SNR and limited performance on morphologies not present in the training distribution. The study emphasizes the value of a synthetic training framework for transfer learning to real observations, while highlighting the need for physically informed priors and uncertainty quantification to ensure robustness across diverse morphologies, including diffuse and merger-driven emission. Overall, the proposed framework enables transferable, high-fidelity denoising for ALMA and future IFU datasets from facilities like JWST and VLT/MUSE, enhancing flux conservation and kinematic recovery in faint, high-z galaxies.

Abstract

Beyond cosmic noon, galaxies appear as faint whispers amid noise, yet this epoch is key to understanding massive galaxy assembly. ALMA's sensitivity to cold dust and [C II] emission allows us to probe their interstellar medium, but faint signals make robust denoising essential. We evaluate and benchmark denoising strategies including Principal Component Analysis, Independent Component Analysis, sparse unsupervised representations: iterative soft thresholding with 2D-1D wavelets, and supervised deep learning with a 3D U-Net, to identify techniques that suppress noise while preserving flux and morphology across peak SNRs of 2.5-8, applied to (i) synthetic spectral cubes of rotating toy disk galaxies, (ii) synthetic [C II] IFU cubes from FIRE simulations, and (iii) ALMA [C II] observations of CRISTAL galaxies and W2246-0526. Performance is assessed via RMSE, conservation of flux and spectra, noise reduction, and SNR improvement of the central galaxy. For synthetic cubes: PCA and ICA provide marginal improvement; IST reduces noise effectively at moderate SNRs but can suppress emission at low SNRs; and the U-Net outperforms IST, though it can produce quantifiable hallucinations at lower-SNRs. For moderate-SNR observations (ALMA-CRISTAL), U-Net and IST achieve comparable performance, conserving >91% flux and increasing SNR by >6. However, for observations with complex morphologies absent in the training set (W2246), the U-Net underperforms relative to IST, recovering ~80% flux, while IST robustly conserves flux and improves SNR by ~3, highlighting generalisation challenges and the need for physically-motivated training priors. We conclude that IST is a robust unsupervised denoiser for moderate-SNR data, and a synthetically trained U-Net generalises effectively to real data, dependent on training priors. This framework offers a pathway for transferable denoising for ALMA, VLT/MUSE, and JWST.
Paper Structure (42 sections, 32 equations, 20 figures, 1 table)

This paper contains 42 sections, 32 equations, 20 figures, 1 table.

Figures (20)

  • Figure 1: Top Left: Moment 0 map (integrated flux along line of sight) of a system with one large central galaxy and two satellites, both inclined uniquely in the field of view, and convolved with a two-dimensional Gaussian beam with FWHM = 3.75 px $\sim 3.12 \rm\: kpc$; Top Right: Moment 1 map (intensity weighted average velocity along the line of sight for each spaxel) within the emission region depicting the kinematics of the system; Bottom: Spatially integrated line-of-sight velocity spectrum of the system within the emission region, depicting a double-horned spectral morphology due to the distinctly modeled kinematic features due to the rotation of the galaxies.
  • Figure 2: Simulation results for an unresolved system of galaxies in a larger field of view, convolved with a two-dimensional Gaussian beam with FWHM = 3.75px $\sim 15.63 \rm \: kpc$. Top Left: Moment 0 map Top Right: Moment 1 map depicting the kinematics of the system; Bottom: Line-of-sight velocity spectrum of the system within the emission region, depicting a peak at the central velocity and non-substantially modeled kinematic features on either side due to the small size of the source in pixels within the field of view.
  • Figure 3: Visualisation of one spectral slice of a system with one galaxy at varying resolutions and levels of spatially correlated Gaussian noise.
  • Figure 4: Top: Moment 0 map of the original full-resolution mock IFU from the FIRE simulation in $\rm Jy/px$; center:Left: Moment 0 map after beam convolution, and Right: Moment 1 map depicting the kinematics; Bottom: Line of sight velocity spectrum depicting a broad peak centred around the central velocity with multiple minor peaks.
  • Figure 5: Top Left: Moment 0 map of CRISTAL-02 (resolved by the beam), and Top Right: CRISTAL-19 (unresolved by the beam); Bottom: The corresponding line-of-sight spatially integrated velocity spectra for the two spectral cubes within the emission apertures. These examples show the high levels of noise and variable resolutions explored in this survey.
  • ...and 15 more figures