Table of Contents
Fetching ...

Distinguishing Orbiting and Infalling Dark Matter Particles with Machine Learning

Ze'ev Vladimir, Calvin Osinga, Benedikt Diemer, Edgar M. Salazar, Eduardo Rozo

TL;DR

This work addresses artifacts from traditional halo boundaries by defining halos as the orbiting particle set within a potential well and introduces a scalable, two-epoch classification approach. An XGBoost classifier uses six inputs—$r/R_{\rm 200m}$, $v_r/V_{\rm 200m}$, and $v_t/V_{\rm 200m}$ at two epochs separated by one dynamical time—to distinguish orbiting from infalling particles, achieving ~97% accuracy against trajectory-based ground truth. It reproduces orbiting and infalling density profiles within ~5% out to $R_{\rm 200m}$ and generalizes to a Planck-like cosmology, demonstrating robust performance across cosmologies. SHAP analysis shows radius features dominate the decision, while the method offers a fast, scalable alternative to full orbital tracking for large N-body simulations, with public code and models released for broader use.

Abstract

Dark matter halos are typically defined as spheres that enclose some overdensity, but these sharp, somewhat arbitrary boundaries introduce non-physical artifacts such as backsplash halos, pseudo-evolution, and an incomplete accounting of halo mass. A more physically motivated alternative is to define halos as the collection of particles that are physically orbiting within their potential well. However, existing methods to classify particles as orbiting or infalling suffer from trade-offs between accuracy, computational cost, and generalizability across cosmologies. We present an efficient, yet accurate, supervised machine learning approach using decision trees. The classification is based on only the particle radii and velocities at two epochs. Compared to detailed analysis of particle trajectories, we find that our model matches the classification of 97\% of particles. Consequently, we are able to quickly and accurately reproduce the density profiles of the orbiting and infalling components out to many virial radii. We demonstrate that our model generalizes to a significantly different cosmology that lies outside the training dataset. We make publicly available both our final model and the code to train similar models.

Distinguishing Orbiting and Infalling Dark Matter Particles with Machine Learning

TL;DR

This work addresses artifacts from traditional halo boundaries by defining halos as the orbiting particle set within a potential well and introduces a scalable, two-epoch classification approach. An XGBoost classifier uses six inputs—, , and at two epochs separated by one dynamical time—to distinguish orbiting from infalling particles, achieving ~97% accuracy against trajectory-based ground truth. It reproduces orbiting and infalling density profiles within ~5% out to and generalizes to a Planck-like cosmology, demonstrating robust performance across cosmologies. SHAP analysis shows radius features dominate the decision, while the method offers a fast, scalable alternative to full orbital tracking for large N-body simulations, with public code and models released for broader use.

Abstract

Dark matter halos are typically defined as spheres that enclose some overdensity, but these sharp, somewhat arbitrary boundaries introduce non-physical artifacts such as backsplash halos, pseudo-evolution, and an incomplete accounting of halo mass. A more physically motivated alternative is to define halos as the collection of particles that are physically orbiting within their potential well. However, existing methods to classify particles as orbiting or infalling suffer from trade-offs between accuracy, computational cost, and generalizability across cosmologies. We present an efficient, yet accurate, supervised machine learning approach using decision trees. The classification is based on only the particle radii and velocities at two epochs. Compared to detailed analysis of particle trajectories, we find that our model matches the classification of 97\% of particles. Consequently, we are able to quickly and accurately reproduce the density profiles of the orbiting and infalling components out to many virial radii. We demonstrate that our model generalizes to a significantly different cosmology that lies outside the training dataset. We make publicly available both our final model and the code to train similar models.

Paper Structure

This paper contains 12 sections, 9 figures.

Figures (9)

  • Figure 1: The distribution of particles for a randomly selected dark matter halo from the Bolshoi L0063 simulation (left), separated by Sparta's classifications for orbiting (middle) and infalling (right) particles. The black circle marks $R_{\rm 200m}$ and the green circle is the radius out to which we search for particles at $4 R_{\rm 200m}$. As expected, orbiting particles preferentially exist near the halo center while distant particles tending to be infalling. A non-negligible fraction of orbiting particles, 18.5% in this case, exist outside of $R_{\rm 200m}$ for this halo, agreeing with other works diemer_splashback_2017diemer_splashback_2017-1.
  • Figure 2: Phase-space distribution of all (top row), infalling (second row), and orbiting particles (third row) from the test dataset, and the ratio between infalling and orbiting (bottom row). The first three rows are normalized by the total number of particles in the dataset and the size of each bin (black signifies an empty bin). The first three columns correspond to the combinations of radii and velocities at the current snapshot, whereas the final column depicts their past radii and radial velocities. We linearly bin the velocity distributions in the inner regions to improve visibility, with logarithmic bins elsewhere (the transition is indicated with gray dashed lines). Each panel is labeled in the top left corner to facilitate discussions in the text. Infalling particles typically reside at large radii and slightly negative radial velocities (D5--D7). Conversely, orbiting particles are concentrated in the halo's center, with larger tangential velocities and an average radial velocity near zero (D9--D11). In the previous snapshot, infalling particles tend to exist at large radii with small radial velocities (D8), as they fail to reach the halo and complete a pericenter before the present snapshot. This is also reflected in the suppressed occupation of orbiting particles in the same region of D12. Regions in phase-space that contain approximately equal numbers of infalling and orbiting particles (white areas in D13--D16) present a challenge for our classifier.
  • Figure 3: Same as Fig. \ref{['fig:ptl_dist']} but showing the fraction of misclassified particles in each bin. Areas that have a high concentration of one population correspondingly have low misclassification rates. We suspect that interactions with subhalos lead to the population of infalling particles with a high misclassification rate beyond $0.5R_{\rm 200m}$ and with positive radial velocities (S5). Orbiting particles located outside their areas of high concentration are often misclassified since they are a small, outlier population (S9--S11).
  • Figure 4: Median orbiting (left), and infalling (right) density profiles of all particles from the WMAP7 test dataset. We bin halos by peak height and hide radial bins where fewer than 50% of the total number of halos contain particles, with darker colors representing larger peak heights. The solid lines are density profiles from the model's classifications, and the dashed lines are the profiles from Sparta. The bottom panels show the corresponding ratio between the median profiles, with the shaded region representing the 16th and 84th percentile of the ratios between the individual halo profiles. Since both our total profiles and SPARTA's profiles are constructed from the raw particle count in each radial bin, they must agree, as confirmed in the left panel. The model accurately picks up on the key characteristics of the orbiting and infalling profiles across masses. The increased errors for larger peak heights are largely results of a smaller population of halos.
  • Figure 5: Same as Fig. \ref{['fig:dens_prf']}, but for the Planck-like cosmology. Similarly to the fiducial cosmology, the model closely matches the density profiles based on Sparta except for regions with very few particles. Unlike the Bolshoi cosmology simulations, the Planck cosmology simulations only reach a box size of $500 \: \rm{Mpc}$, which results in too few halos in the $3.0 < \nu < 6.0$ bin.
  • ...and 4 more figures