Table of Contents
Fetching ...

Solving Milky Way-sized Systems with Haskap Pie: A Halo finding Algorithm with efficient Sampling, K-means clustering, tree-Assembly, Particle tracking, Python modules, Inter-code applicability, and Energy solving

Kirk S. S. Barrow, Thinh Huu Nguyen, Edward C. Skrabacz

TL;DR

Haskap Pie addresses the challenge of robustly identifying and tracking halos in Milky Way–sized systems across diverse simulation codes. It unifies overdensity finding, energy-based clustering, and forward/backward particle tracking into a Python-based workflow with memory- and load-balanced optimizations, enabling halo trees to be built on standard hardware. The method yields more complete halo populations (notably for halos with >100 particles), longer-lived subhalo tracks, and more physically consistent dynamical properties than Rockstar+Consistent Trees, while maintaining compatibility with AGORA and other simulations. This approach has significant implications for studies of mergers, satellite galaxies, and galaxy assembly, offering a flexible, accessible tool that scales from laptops to clusters. The work also highlights remaining limitations (e.g., completeness at low particle counts) and outlines clear directions for future development and integration with broader analysis pipelines.

Abstract

We describe a new Python-based stand-alone halo finding algorithm, Haskap Pie, that combines several methods of halo finding and tracking into a single calculation. Our halo-finder flexibly solves halos for simulations produced by eight simulation codes (ART-I, ENZO, RAMSES, CHANGA, GADGET-3, GEAR, AREPO, and GIZMO) and for both zoom-in or full-box N-body or hydrodynamical simulations and includes a unified, robust set of pre-tuned parameters. When compared to Rockstar and Consistent Trees, our halo-finder tracks subhalos much longer and more consistently, produces halos with better constrained physical parameters, and returns a much denser halo mass function for halos with more than 100 particles. Our results also compare favorably to recently described specialized particle-tracking extensions to Rockstar. Haskap Pie is well-suited to a variety of studies of simulated galaxies and is particularly robust for a new generation of studies of merging and satellite galaxies. For our initial paper, we focus on describing our algorithm's ability to find and track halos and subhalos in complex Milky Way-sized halo systems.

Solving Milky Way-sized Systems with Haskap Pie: A Halo finding Algorithm with efficient Sampling, K-means clustering, tree-Assembly, Particle tracking, Python modules, Inter-code applicability, and Energy solving

TL;DR

Haskap Pie addresses the challenge of robustly identifying and tracking halos in Milky Way–sized systems across diverse simulation codes. It unifies overdensity finding, energy-based clustering, and forward/backward particle tracking into a Python-based workflow with memory- and load-balanced optimizations, enabling halo trees to be built on standard hardware. The method yields more complete halo populations (notably for halos with >100 particles), longer-lived subhalo tracks, and more physically consistent dynamical properties than Rockstar+Consistent Trees, while maintaining compatibility with AGORA and other simulations. This approach has significant implications for studies of mergers, satellite galaxies, and galaxy assembly, offering a flexible, accessible tool that scales from laptops to clusters. The work also highlights remaining limitations (e.g., completeness at low particle counts) and outlines clear directions for future development and integration with broader analysis pipelines.

Abstract

We describe a new Python-based stand-alone halo finding algorithm, Haskap Pie, that combines several methods of halo finding and tracking into a single calculation. Our halo-finder flexibly solves halos for simulations produced by eight simulation codes (ART-I, ENZO, RAMSES, CHANGA, GADGET-3, GEAR, AREPO, and GIZMO) and for both zoom-in or full-box N-body or hydrodynamical simulations and includes a unified, robust set of pre-tuned parameters. When compared to Rockstar and Consistent Trees, our halo-finder tracks subhalos much longer and more consistently, produces halos with better constrained physical parameters, and returns a much denser halo mass function for halos with more than 100 particles. Our results also compare favorably to recently described specialized particle-tracking extensions to Rockstar. Haskap Pie is well-suited to a variety of studies of simulated galaxies and is particularly robust for a new generation of studies of merging and satellite galaxies. For our initial paper, we focus on describing our algorithm's ability to find and track halos and subhalos in complex Milky Way-sized halo systems.

Paper Structure

This paper contains 34 sections, 5 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: 2-D demonstration of the particle sampling technique. Left is a 2-D slice of the particles about the main halo in the AGORA ENZO simulation at $z= 0.0989$ and right is the region after sampling in 12 (inner region) or 48 (outer regions) directions in the $x-y$ plane at 70 radii, which are lightly shaded into the left plot. The size of the points is proportional to the square root of their mass. Insets show the density (purple) and particle number distributions (red) as labeled from the center by taking a 1-D slice in the $x$-direction. The bottom inset in each plot reports the remaining number of particles after both slices of the original 7,820,075 particles within a box bounding 3.5$r_{\rm vir}$. For this example, sampled densities at the center of annular sectors have a mean error of $\sim$4.1% and no error in enclosed mass at the annuli radial boundaries despite using less than 1/110th of the particles.
  • Figure 2: Figures showing the evolution of halo solving through our pipeline. Left: Clusters of halos and subhalos found for one candidate overdense volume with three iterations of k-means clustering showing the energy distributions of the halos (top) and their physical extent (bottom). Colored scatter points are particles found to be bound to the halos. Center: Overlapping results of all clusters found for all overdense volumes. Right: Halos confirmed by backward modeling and pruning for one timestep. Our method overpopulates halos and subhalos and then prunes by only including halos that have a sustained, unique physical presence across at least five consecutive timesteps. The visually apparent flattening of the particle number density distribution aids the identification of subhalos using k-means clustering.
  • Figure 3: Plots demonstrating halo redundancy for our test simulation. Top left: All halos found by our cluster-finding solution within a four virial radii box centered on the main halo ($1.3 \times 10^9$ M$\odot$ at $z\sim7.5$) before pruning, showing multiple solutions for most halos and sub-halos. Colors indicate the radius (200c) and sampled particle membership of corresponding halos. Top right: The halo catalog after four rounds of tracking and pruning. Bottom row: Left: The total halo counts from the combination of our overdensity-finding and energy-solving method, without pruning (blue), after four rounds of particle tracking without pruning (orange), and the final halo counts with prunning throughout the calculation (green) all versus halo particle counts. Center: The inter-quartile range of number duplicates that are produced for each halo before (solid line) and after (dotted line) particle tracking without pruning versus halo particle counts. A value of zero indicates halos are singular and not duplicated. Right: The same as the center plot but divided by bin counts to normalize duplicates by halo populations. For halos with at least 100 particles, the number density of duplicates increases with particle count with an average of more than seven duplicates per halo. The number of duplicates is therefore not limited by the particle-tracking algorithms.
  • Figure 4: Example per-halo single-core computational time for a sample of halos in a timestep showing the three most expensive portions of the algorithm that threads must calculate independently: particle pruning (purple), particle sampling (red), and energy solving (green). Lines are also drawn showing hypothetical linear or quadratic relationships for each of the calculation types of the corresponding color. Note that particles are only sampled when the number of particles in the search region is greater than 10$^4$ (shown as a vertical black line). The median and mean total calculation time (all three combined) per halo in this sample is $\sim$0.7ms and $\sim$0.4s respectively. Particle sampling is projected to become the slowest calculation for halos with more than a few million dark matter particles unless further optimized.
  • Figure 5: Halo populations from a N-body only simulation solved with Haskap Pie. Left: Halo mass function showing linear theory in black and our population in red for simulation data representing $z=0$. Center, halo populations across redshifts as a fraction of linear theory 1974ApJ...187..425P with total halo counts at each included redshift included in the legend. Right: Halo counts as a function of halo mass and is colored by the age of the Universe in a color gradient. The blue line shows the initial mass function of halos. Results show that our results are most complete for halos consisting of more than 100 particles as well as trends associated with halos assembly.
  • ...and 7 more figures