CNN+FoF: application of deep learning to the identification of dark matter haloes
Soumadeep Maiti, Carlos M. Correa, Andrea Fiorilli, Andrés N. Ruiz, Dante J. Paz, Alejandro Pérez Fernández, Ariel G. Sánchez
TL;DR
The primary objective of this study is to offer a faster and scalable alternative to conventional halo finders, achieving a speed-up of approximately one order of magnitude relative to ROCKSTAR, offering a promising pathway for modern simulation-based inference methods that rely on rapid and accurate structure identification.
Abstract
We present a deep-learning-based approach for identifying dark matter haloes in cosmological N-body simulations. Our framework consists of a volumetric Convolutional Neural Network to classify individual simulation particles as either halo or non-halo members, followed by a highly optimised and parallelised Friends-of-Friends clustering algorithm that groups the classified halo members into distinct haloes. The training data comprise simulations generated using GADGET-4, with labels obtained with the ROCKSTAR halo finder. Our models incorporate two main halo mass definitions, $M_{200\mathrm{b}}$ and $M_{\text{vir}}$, with similar performance. For haloes defined by the ROCKSTAR $M_{200\mathrm{b}}$ criterion, the classification network demonstrated stable performance across multiple simulation resolutions. For the highest resolution, it achieved over $98\%$ across all primary performance metrics when identifying halo particles. Furthermore, the FoF algorithm yielded halo catalogues with a purity generally exceeding $95\%$ and a stable completeness of $93\%$ for masses above $5\times10^{11} \, M_\odot$. Our pipeline recovered the centre-of-mass positions, velocities and halo masses with high fidelity, yielding a halo mass function consistent to within $5\%$ of the reference while faithfully reconstructing the internal density profiles. The primary objective of this study is to offer a faster and scalable alternative to conventional halo finders, achieving a speed-up of approximately one order of magnitude relative to ROCKSTAR, offering a promising pathway for modern simulation-based inference methods that rely on rapid and accurate structure identification.
