Table of Contents
Fetching ...

From Redshift to Real Space: Combining Linear Theory With Neural Networks

Edoardo Maragliano, Punyakoti Ganeshaiah Veena, Giulia Degni, Enzo Franco Branchini

TL;DR

This paper tackles redshift-space distortions in large-scale structure analyses by proposing a hybrid LT+NN reconstruction that merges physics-based linear theory with a neural network to map redshift-space halo fields to real space. The LT component corrects large-scale distortions while the NN learns quasi-linear and small-scale corrections, trained on 100 z=1 Quijote halo catalogs, yielding approximately $50\%$ lower MSE than LT alone and $\approx12\%$ lower than NN alone, with a cross-correlation to the true real-space field near unity ($r \approx 0.98$). It also improves two-point statistics, including BAO-scale features, and void measurements, at modest training data and compute cost, demonstrating a synergistic benefit from combining analytical models with machine learning. The results indicate that the LT+NN hybrid can robustly reconstruct real-space fields from redshift-space data and holds promise for application to upcoming wide-field galaxy surveys, subject to validation on more realistic datasets and survey conditions.

Abstract

Spectroscopic redshift surveys are key tools to trace the large-scale structure (LSS) of the Universe and test the $Λ$CDM model. However, using redshifts as distance proxies introduces distortions in the 3D galaxy distribution. If uncorrected, these distortions lead to systematic errors in LSS analyses and cosmological parameter estimation. We present a new method that combines linear theory (LT) and a neural network (NN) to mitigate redshift space distortions (RSDs). The hybrid LT+NN approach is trained and validated on dark matter halo fields from z = 1 snapshots of the Quijote N-body simulations. LT corrects large-scale distortions in the linear regime, while the NN learns quasi-linear and small-scale features. The LT correction is applied first, then the NN is trained on the resulting fields to improve accuracy across scales. The method uses a Mean Squared Error (MSE) loss and yields significant performance gains: approximately 50% improvement over LT alone and 12% over NN alone. The reconstructed fields from the LT+NN method show stronger correlations with the true real-space fields than either LT or NN separately. The hybrid method also improves clustering statistics such as halo-halo and halo-void correlations, with benefits extending to BAO scales. Compared to NN-only, it provides better suppression of spurious anisotropies on large and quasi-linear scales, as measured by the quadrupole moments of correlation functions. This work shows that combining a physically motivated dynamical model with a machine learning algorithm leverages the strengths of both approaches. The LT+NN method achieves high accuracy with modest training data and computational cost, making it a promising tool for future applications to more realistic galaxy surveys.

From Redshift to Real Space: Combining Linear Theory With Neural Networks

TL;DR

This paper tackles redshift-space distortions in large-scale structure analyses by proposing a hybrid LT+NN reconstruction that merges physics-based linear theory with a neural network to map redshift-space halo fields to real space. The LT component corrects large-scale distortions while the NN learns quasi-linear and small-scale corrections, trained on 100 z=1 Quijote halo catalogs, yielding approximately lower MSE than LT alone and lower than NN alone, with a cross-correlation to the true real-space field near unity (). It also improves two-point statistics, including BAO-scale features, and void measurements, at modest training data and compute cost, demonstrating a synergistic benefit from combining analytical models with machine learning. The results indicate that the LT+NN hybrid can robustly reconstruct real-space fields from redshift-space data and holds promise for application to upcoming wide-field galaxy surveys, subject to validation on more realistic datasets and survey conditions.

Abstract

Spectroscopic redshift surveys are key tools to trace the large-scale structure (LSS) of the Universe and test the CDM model. However, using redshifts as distance proxies introduces distortions in the 3D galaxy distribution. If uncorrected, these distortions lead to systematic errors in LSS analyses and cosmological parameter estimation. We present a new method that combines linear theory (LT) and a neural network (NN) to mitigate redshift space distortions (RSDs). The hybrid LT+NN approach is trained and validated on dark matter halo fields from z = 1 snapshots of the Quijote N-body simulations. LT corrects large-scale distortions in the linear regime, while the NN learns quasi-linear and small-scale features. The LT correction is applied first, then the NN is trained on the resulting fields to improve accuracy across scales. The method uses a Mean Squared Error (MSE) loss and yields significant performance gains: approximately 50% improvement over LT alone and 12% over NN alone. The reconstructed fields from the LT+NN method show stronger correlations with the true real-space fields than either LT or NN separately. The hybrid method also improves clustering statistics such as halo-halo and halo-void correlations, with benefits extending to BAO scales. Compared to NN-only, it provides better suppression of spurious anisotropies on large and quasi-linear scales, as measured by the quadrupole moments of correlation functions. This work shows that combining a physically motivated dynamical model with a machine learning algorithm leverages the strengths of both approaches. The LT+NN method achieves high accuracy with modest training data and computational cost, making it a promising tool for future applications to more realistic galaxy surveys.

Paper Structure

This paper contains 20 sections, 10 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Mean squared error values as a function of training epochs for the different reconstruction methods considered in this work: NN (blue symbols and curves), LT+NN (red), and LT (gray horizontal line). Dots and crosses indicate results obtained with the training and validation sets, respectively. LT reconstructions were performed using Gaussian smoothing with a radius of $R_s = 10\, h^{-1}\,\mathrm{Mpc}$.
  • Figure 2: Residuals between the density field obtained with the three reconstruction methods, identified by the top labels, and the real space density field. The plots show the density in a slice of $7.8\, h^{-1}\,\mathrm{Mpc}$ from one of the validation set, extracted across the $z$ axis of the cube and expressed in number of halos per cell.
  • Figure 3: Scatter plot of the true vs. the reconstructed halo number density -- in units of halos per cell -- measured at the points of the $128^3$ grid for the three reconstruction methods considered: LT (left), NN (middle) and LT+NN (right). The plot was made by sampling points from the 20 fields of the validation set.
  • Figure 4: Probability distribution functions (PDFs) of the reconstructed density fields, normalized by the mean, for the three methods (LT, NN, and LT+NN) are shown alongside the true distribution and labeled accordingly. The plot shows the average over 20 validation samples and displays the PDFs in lin-log scales, to better highlight discrepancies.
  • Figure 5: Monopole and quadrupole moments of the power spectrum of the true and reconstructed halo number density fields, computed on $128^3$ grids. In all cases, we show the average over the 20 validation fields. Different colors indicate different types of reconstructions, as specified in the figure legend. The left (right) panels display the monopole (quadrupole) moments in the top panels, and the corresponding residuals with respect to the reference (true) fields are in the bottom panels. The grey bands in the bottom panels represent the 1$\sigma$ and 2$\sigma$ uncertainty regions, estimated from the scatter among the 20 realizations. LT and LT+NN reconstructions were performed using a Gaussian filter of radius $R_s = 10\,h^{-1}\,\mathrm{Mpc}$.
  • ...and 7 more figures