Table of Contents
Fetching ...

Learning Cosmology from Nearest Neighbour Statistics

Atrideb Chatterjee, Arka Banerjee, Francisco Villaescusa-Navarro, Tom Abel

TL;DR

This work addresses the challenge of extracting sub-percent cosmological constraints from large-scale structure by moving beyond the 2-point statistic. It introduces Nearest-Neighbour distance maps as a field-level representation of discrete halo data and combines them with NN-CDFs in a hybrid neural network to infer $\Omega_m$ and $\sigma_8$ from Quijote N-body simulations. The proposed Map+CDF architecture achieves state-of-the-art accuracy with substantially lower computational cost than competing point-cloud approaches, and outperforms 2-point-function baselines by a large margin. This approach is particularly well-suited for future surveys with massive galaxy catalogs, though incorporating halo mass/velocity and redshift-space distortions is identified as a path for further tightening constraints.

Abstract

Extracting cosmological parameters from galaxy/halo catalogues with sub-percent level accuracy is an important aspect of modern cosmology, especially in view of ongoing and upcoming surveys such as Euclid, DESI, and LSST. While traditional two-point statistics have been known to be suboptimal for this task, recently proposed k-Nearest Neighbour (kNN) based summary statistics have demonstrated tighter constraining power. Building on the kNN statistics, we introduce a new field-level representation of discrete halo catalogues - NN distance maps. We employ this technique on the halo catalogues obtained from Quijote N-body simulation suites. By combining these maps with kNN-based summary statistics, we train a hybrid neural network to infer cosmological parameters, showing that the resulting constraints achieve state-of-the-art, if not the best, accuracy. In addition, our hybrid framework is 5-10 times more computationally efficient than some of the existing point-cloud-based ML methods.

Learning Cosmology from Nearest Neighbour Statistics

TL;DR

This work addresses the challenge of extracting sub-percent cosmological constraints from large-scale structure by moving beyond the 2-point statistic. It introduces Nearest-Neighbour distance maps as a field-level representation of discrete halo data and combines them with NN-CDFs in a hybrid neural network to infer and from Quijote N-body simulations. The proposed Map+CDF architecture achieves state-of-the-art accuracy with substantially lower computational cost than competing point-cloud approaches, and outperforms 2-point-function baselines by a large margin. This approach is particularly well-suited for future surveys with massive galaxy catalogs, though incorporating halo mass/velocity and redshift-space distortions is identified as a path for further tightening constraints.

Abstract

Extracting cosmological parameters from galaxy/halo catalogues with sub-percent level accuracy is an important aspect of modern cosmology, especially in view of ongoing and upcoming surveys such as Euclid, DESI, and LSST. While traditional two-point statistics have been known to be suboptimal for this task, recently proposed k-Nearest Neighbour (kNN) based summary statistics have demonstrated tighter constraining power. Building on the kNN statistics, we introduce a new field-level representation of discrete halo catalogues - NN distance maps. We employ this technique on the halo catalogues obtained from Quijote N-body simulation suites. By combining these maps with kNN-based summary statistics, we train a hybrid neural network to infer cosmological parameters, showing that the resulting constraints achieve state-of-the-art, if not the best, accuracy. In addition, our hybrid framework is 5-10 times more computationally efficient than some of the existing point-cloud-based ML methods.

Paper Structure

This paper contains 14 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: A 2D slice of the 1st (left) and 4th (right) Nearest neighbour distance maps for one of the simulations in the Quijote simulation suites used in this study. Each pixel is coloured by the distance from the pixel to the nearest data point on the left panel, and by the distance to the 4th nearest neighbour data point in the right panel. As can be seen, this converts the discrete dataset into a smooth, continuous map. The colorbar represents the distance (in Gpc/$h$) from the halos. Note that these maps are only for the purpose of visualization. They are produced with $256^2$ random query points in a $256 \times 256$ 2D grid, whereas the actual maps used in this study are produced with $10^2$ random query points in a $100 \times 100$ 2D grid, as mentioned in \ref{['subsec:NN_maps']}.
  • Figure 2: The CDF (left panel) and Peaked CDF (right panel) for 1NN (orange), 2NN (red), 3NN (magenta), and 4NN (blue) corresponding to one of the Quijote simulations in this study.
  • Figure 3: Hybrid Network in this study. The NN distance maps are used as input to the ResNet block. The output of the ResNet is then concatenated with the NN CDFs, and the merged input then passes through the inference blocks (containing several Linear, ReLU, and dropout layers) to predict the mean and standard deviation of the inferred cosmological parameters. The values in brackets show the dimension of the tensor in different stages of the architecture. Here, B denotes the batch dimension.
  • Figure 4: The performance of different models when trained to predict likelihood-free inference on both the values of $\Omega_{m}$ (left column) and $\sigma_{8}$ (right column) in 3 scenarios: 1) top row: CDF-only 2)Middle row: Map-Only 3) bottom panel: Map+CDF. The values for different validation metrics are mentioned in the legend. As can be seen, the Map-only scenario (middle panel) is performing worse than the CDF-only (top panel) scenario. Further, Map+CDF model performs the best across all the validation metrics.
  • Figure 5: Comparison between $\xi(r)$-Only and CDF-Only. As shown, CDF-Only performs much better compared to $\xi(r)$-Only as expected from Banerjee_2021a.
  • ...and 1 more figures