Learning Cosmology from Nearest Neighbour Statistics
Atrideb Chatterjee, Arka Banerjee, Francisco Villaescusa-Navarro, Tom Abel
TL;DR
This work addresses the challenge of extracting sub-percent cosmological constraints from large-scale structure by moving beyond the 2-point statistic. It introduces Nearest-Neighbour distance maps as a field-level representation of discrete halo data and combines them with NN-CDFs in a hybrid neural network to infer $\Omega_m$ and $\sigma_8$ from Quijote N-body simulations. The proposed Map+CDF architecture achieves state-of-the-art accuracy with substantially lower computational cost than competing point-cloud approaches, and outperforms 2-point-function baselines by a large margin. This approach is particularly well-suited for future surveys with massive galaxy catalogs, though incorporating halo mass/velocity and redshift-space distortions is identified as a path for further tightening constraints.
Abstract
Extracting cosmological parameters from galaxy/halo catalogues with sub-percent level accuracy is an important aspect of modern cosmology, especially in view of ongoing and upcoming surveys such as Euclid, DESI, and LSST. While traditional two-point statistics have been known to be suboptimal for this task, recently proposed k-Nearest Neighbour (kNN) based summary statistics have demonstrated tighter constraining power. Building on the kNN statistics, we introduce a new field-level representation of discrete halo catalogues - NN distance maps. We employ this technique on the halo catalogues obtained from Quijote N-body simulation suites. By combining these maps with kNN-based summary statistics, we train a hybrid neural network to infer cosmological parameters, showing that the resulting constraints achieve state-of-the-art, if not the best, accuracy. In addition, our hybrid framework is 5-10 times more computationally efficient than some of the existing point-cloud-based ML methods.
