Distinguishing Orbiting and Infalling Dark Matter Particles with Machine Learning
Ze'ev Vladimir, Calvin Osinga, Benedikt Diemer, Edgar M. Salazar, Eduardo Rozo
TL;DR
This work addresses artifacts from traditional halo boundaries by defining halos as the orbiting particle set within a potential well and introduces a scalable, two-epoch classification approach. An XGBoost classifier uses six inputs—$r/R_{\rm 200m}$, $v_r/V_{\rm 200m}$, and $v_t/V_{\rm 200m}$ at two epochs separated by one dynamical time—to distinguish orbiting from infalling particles, achieving ~97% accuracy against trajectory-based ground truth. It reproduces orbiting and infalling density profiles within ~5% out to $R_{\rm 200m}$ and generalizes to a Planck-like cosmology, demonstrating robust performance across cosmologies. SHAP analysis shows radius features dominate the decision, while the method offers a fast, scalable alternative to full orbital tracking for large N-body simulations, with public code and models released for broader use.
Abstract
Dark matter halos are typically defined as spheres that enclose some overdensity, but these sharp, somewhat arbitrary boundaries introduce non-physical artifacts such as backsplash halos, pseudo-evolution, and an incomplete accounting of halo mass. A more physically motivated alternative is to define halos as the collection of particles that are physically orbiting within their potential well. However, existing methods to classify particles as orbiting or infalling suffer from trade-offs between accuracy, computational cost, and generalizability across cosmologies. We present an efficient, yet accurate, supervised machine learning approach using decision trees. The classification is based on only the particle radii and velocities at two epochs. Compared to detailed analysis of particle trajectories, we find that our model matches the classification of 97\% of particles. Consequently, we are able to quickly and accurately reproduce the density profiles of the orbiting and infalling components out to many virial radii. We demonstrate that our model generalizes to a significantly different cosmology that lies outside the training dataset. We make publicly available both our final model and the code to train similar models.
