Table of Contents
Fetching ...

Hydrogen intensity mapping with MeerKAT: Preserving cosmological signal by optimising contaminant separation

Isabella P. Carucci, José L. Bernal, Steven Cunnington, Mario G. Santos, Jingying Wang, José Fonseca, Keith Grainge, Melis O. Irfan, Yichao Li, Alkistis Pourtsidou, Marta Spinelli, Laura Wolz

TL;DR

This work tackles the critical problem of foreground removal in HI intensity mapping by evaluating PCA/SVD-based pipelines and introducing a multiscale PCA (mPCA) framework to preserve cosmological signal while suppressing contaminants. Using MeerKAT L-band data at z≈0.4 cross-correlated with WiggleZ galaxies, the authors demonstrate that cleaning contaminants within a conservative footprint and treating large and small angular scales independently yields robust detections with minimal signal loss, without requiring transfer-function corrections. The study shows that mPCA outperforms traditional PCA in cross-correlation amplitude, stability across k-scales, and variance reduction, marking a key methodological advance for upcoming MeerKAT and SKAO HI intensity mapping efforts. These results enhance confidence in using HI intensity maps to probe large-scale structure and demonstrate the practical viability of optimized, scale-aware foreground separation for future cosmological analyses.

Abstract

Removing contaminants is a delicate, yet crucial step in neutral hydrogen (HI) intensity mapping and often considered the technique's greatest challenge. Here, we address this challenge by analysing HI intensity maps of about $100$ deg$^2$ at redshift $z\approx0.4$ collected by the MeerKAT radio telescope, an SKA Observatory (SKAO) precursor, with a combined 10.5-hour observation. Using unsupervised statistical methods, we removed the contaminating foreground emission and systematically tested, step-by-step, some common pre-processing choices to facilitate the cleaning process. We also introduced and tested a novel multiscale approach: the data were redundantly decomposed into subsets referring to different spatial scales (large and small), where the cleaning procedure was performed independently. We confirm the detection of the HI cosmological signal in cross-correlation with an ancillary galactic data set, without the need to correct for signal loss. In the best set-up we achieved, we were able to constrain the HI distribution through the combination of its cosmic abundance ($Ω_{HI}$) and linear clustering bias ($b_{HI}$) up to a cross-correlation coefficient ($r$). We measured $Ω_{HI}b_{HI}r = [0.93 \pm 0.17]\,\times\,10^{-3}$ with a $\approx6σ$ confidence, which is independent of scale cuts at both edges of the probed scale range ($0.04 \lesssim k \lesssim 0.3 \,h$ Mpc$^{-1}$), corroborating its robustness. Our new pipeline has successfully found an optimal compromise in separating contaminants without incurring a catastrophic signal loss. This development instills an added degree of confidence in the outstanding science we can deliver with MeerKAT on the path towards HI intensity mapping surveys with the full SKAO.

Hydrogen intensity mapping with MeerKAT: Preserving cosmological signal by optimising contaminant separation

TL;DR

This work tackles the critical problem of foreground removal in HI intensity mapping by evaluating PCA/SVD-based pipelines and introducing a multiscale PCA (mPCA) framework to preserve cosmological signal while suppressing contaminants. Using MeerKAT L-band data at z≈0.4 cross-correlated with WiggleZ galaxies, the authors demonstrate that cleaning contaminants within a conservative footprint and treating large and small angular scales independently yields robust detections with minimal signal loss, without requiring transfer-function corrections. The study shows that mPCA outperforms traditional PCA in cross-correlation amplitude, stability across k-scales, and variance reduction, marking a key methodological advance for upcoming MeerKAT and SKAO HI intensity mapping efforts. These results enhance confidence in using HI intensity maps to probe large-scale structure and demonstrate the practical viability of optimized, scale-aware foreground separation for future cosmological analyses.

Abstract

Removing contaminants is a delicate, yet crucial step in neutral hydrogen (HI) intensity mapping and often considered the technique's greatest challenge. Here, we address this challenge by analysing HI intensity maps of about deg at redshift collected by the MeerKAT radio telescope, an SKA Observatory (SKAO) precursor, with a combined 10.5-hour observation. Using unsupervised statistical methods, we removed the contaminating foreground emission and systematically tested, step-by-step, some common pre-processing choices to facilitate the cleaning process. We also introduced and tested a novel multiscale approach: the data were redundantly decomposed into subsets referring to different spatial scales (large and small), where the cleaning procedure was performed independently. We confirm the detection of the HI cosmological signal in cross-correlation with an ancillary galactic data set, without the need to correct for signal loss. In the best set-up we achieved, we were able to constrain the HI distribution through the combination of its cosmic abundance () and linear clustering bias () up to a cross-correlation coefficient (). We measured with a confidence, which is independent of scale cuts at both edges of the probed scale range ( Mpc), corroborating its robustness. Our new pipeline has successfully found an optimal compromise in separating contaminants without incurring a catastrophic signal loss. This development instills an added degree of confidence in the outstanding science we can deliver with MeerKAT on the path towards HI intensity mapping surveys with the full SKAO.

Paper Structure

This paper contains 49 sections, 29 equations, 24 figures, 1 table.

Figures (24)

  • Figure 1: Left panel: Temperature sky map averaged along the frequency range considered, i.e. $971 < \nu < 1023$ MHz. We highlight the footprint of the WiggleZ galaxies in magenta, the smaller footprint CF where we perform the analysis of this work in cyan, and in yellow the Tukey window function we use for the power spectrum computations (dashed and dotted for the zero and $50\%$ boundaries.). Right panel: Normalised histograms of the sky temperature of the data cubes for the original footprint (OF) in solid blue with respect to the cropped CF in dashed orange. The histograms are computed from the average map for each cube, to marginalise the frequency-dependent evolution of the temperature field. The double-peak structure reflects the galactic synchrotron gradient (low versus high R.A.) present in our sky patch, as shown in the left panel.
  • Figure 2: Flowchart describing the multiscale contaminant subtraction.
  • Figure 3: First two panels: Wavelet-filtered large (first panel) and small scale (second) temperature sky map within the conservative footprint (CF) and averaged along all frequencies; i.e. the sum of the two maps above gives precisely the original one (within the cyan contour) in Fig. \ref{['fig:foot']}. Third panel: Spherically averaged power spectra of the cubes (large scale in solid blue, small in dashed orange) normalised by the variance of each cube. With a green dotted line, we plot the damping term of the telescope beam (refer to the $y$-axis at the right of the panel). Bottom panels: Cylindrical power spectra of the large (left panel) and small-scale cubes (right). Since the wavelet filtering is performed in the angular direction, the difference among the resulting cubes is mostly visible along $k_\perp$ ($x$-axis). The beam suppression also acts in this direction: we plot its expected damping term with iso-contour solid lines corresponding to 50, 20, and $5\%$ suppression, from left to right.
  • Figure 4: Normalised eigenvalues of the frequency-frequency covariance matrix of the data cube. Circles refer to the original cube, plus signs to the wavelet-filtered large-scale cube, and 'x' crosses for the small-scale cube. The large-scale eigenvalues drop down faster with $N_{\rm fg}$: the PCA assumption holds better in this case, and a few modes are enough to describe the large-scale data set. These values correspond to mean-centred maps cropped on the conservative footprint, although the eigenvalues behaviour does not change significantly for the other cases.
  • Figure 5: First four components $\hat{\textbf{S}}_{1i}$ to $\hat{\textbf{S}}_{4i}$ (from top to bottom) in the case where we apply PCA on the original footprint OF (first column) or on the conservative footprint CF (third column) and in the cases where we apply a weighted PCAw (second and fourth columns). All plotted maps are normalised to highlight each component's spatial features and pixel variance. Component-number (row) wise, we impose the same range of values as per the colour bars on the right. Some sources get saturated, highlighting the higher pixel variance they capture than the unsaturated counterparts (especially for the second component). The black contours in the OF maps in the first two columns mark the CF boundaries.
  • ...and 19 more figures