Table of Contents
Fetching ...

jFoF: GPU Cluster Finding with Gradient Propagation

Benjamin Horowitz, Adrian E. Bayer

TL;DR

The paper presents jFoF, a GPU-native Friends-of-Friends halo finder implemented in JAX that performs all neighbor searches, label propagation, and group construction on accelerators, eliminating host-device transfers. It introduces two CUDA-friendly neighbor-search strategies—the $k$-d tree and a linked-cell grid—achieving up to an order-of-magnitude speedup over CPU FoF implementations while preserving catalog fidelity. Beyond performance, jFoF enables differentiable halo finding using frozen assignment and REINFORCE-based topological optimization, including decorated frozen assignments for surrogate mass gradients, enabling end-to-end gradient-based optimization in cosmological pipelines. This work lays the groundwork for integrating differentiable halo catalogs with GPU-accelerated simulators, potentially enabling joint inference and subgrid-model calibration within fully differentiable cosmology workflows.

Abstract

We present jFoF, a fully GPU-native Friends-of-Friends (FoF) halo finder designed for both high-performance simulation analysis and differentiable modeling. Implemented in JAX, jFoF achieves end-to-end acceleration by performing all neighbor searches, label propagation, and group construction directly on GPUs, eliminating costly host--device transfers. We introduce two complementary neighbor-search strategies, a standard k-d tree and a novel linked-cell grid, and demonstrate that jFoF attains up to an order-of-magnitude speedup compared to optimized CPU implementations while maintaining consistent halo catalogs. Beyond performance, jFoF enables gradient propagation through discrete halo-finding operations via both frozen-assignment and topological optimization modes. Using a topological optimization approach via a REINFORCE-style estimator, our approach allows smooth optimization of halo connectivity and membership, bridging continuous simulation fields with discrete structure catalogs. These capabilities make jFoF a foundation for differentiable inference, enabling end-to-end, gradient-based optimization of structure formation models within GPU-accelerated astrophysical pipelines. We make our code publicly available at https://github.com/bhorowitz/jFOF/.

jFoF: GPU Cluster Finding with Gradient Propagation

TL;DR

The paper presents jFoF, a GPU-native Friends-of-Friends halo finder implemented in JAX that performs all neighbor searches, label propagation, and group construction on accelerators, eliminating host-device transfers. It introduces two CUDA-friendly neighbor-search strategies—the -d tree and a linked-cell grid—achieving up to an order-of-magnitude speedup over CPU FoF implementations while preserving catalog fidelity. Beyond performance, jFoF enables differentiable halo finding using frozen assignment and REINFORCE-based topological optimization, including decorated frozen assignments for surrogate mass gradients, enabling end-to-end gradient-based optimization in cosmological pipelines. This work lays the groundwork for integrating differentiable halo catalogs with GPU-accelerated simulators, potentially enabling joint inference and subgrid-model calibration within fully differentiable cosmology workflows.

Abstract

We present jFoF, a fully GPU-native Friends-of-Friends (FoF) halo finder designed for both high-performance simulation analysis and differentiable modeling. Implemented in JAX, jFoF achieves end-to-end acceleration by performing all neighbor searches, label propagation, and group construction directly on GPUs, eliminating costly host--device transfers. We introduce two complementary neighbor-search strategies, a standard k-d tree and a novel linked-cell grid, and demonstrate that jFoF attains up to an order-of-magnitude speedup compared to optimized CPU implementations while maintaining consistent halo catalogs. Beyond performance, jFoF enables gradient propagation through discrete halo-finding operations via both frozen-assignment and topological optimization modes. Using a topological optimization approach via a REINFORCE-style estimator, our approach allows smooth optimization of halo connectivity and membership, bridging continuous simulation fields with discrete structure catalogs. These capabilities make jFoF a foundation for differentiable inference, enabling end-to-end, gradient-based optimization of structure formation models within GPU-accelerated astrophysical pipelines. We make our code publicly available at https://github.com/bhorowitz/jFOF/.

Paper Structure

This paper contains 13 sections, 8 equations, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: Schematic illustration of the two spatial data structures used in jFoF for candidate neighbor search. (a) The k–d tree recursively partitions the particle set into axis-aligned hyperrectangles, enabling efficient range queries at arbitrary depths. (b) The linked-cell grid discretizes the simulation volume into fixed cubic cells, limiting neighbor searches to a local stencil of 9 cells (including self) in 2d. The grey boxes denote regions within the search radius ($k_{max}=8$) for a sample query particle.
  • Figure 2: We show the time scalings for various implementations of FoF halo finding on the same dark matter particle mesh simulations generated from fastpm in a $500$$h^{-1}$ Mpc box. The three jFoF implementations outperform the c-based FoF implementation on a CPU - GPU core hour standpoint by up to an order of magnitude at large particle number.
  • Figure 3: A visualization of halos and sub-halos found in an example CAMELS simulation. We plot the three most massive halos in the volume, and color the 30 largest sub-halos in each with random colors, over-plotted on the total particle distribution. We find our 6d implementation is able to identify merging and merged (sub)structures in phase space.
  • Figure 4: Demonstration of differentiating through frozen assignment for optimization of a simple halo based field-level loss function at fixed initial phases. Each column corresponds to an optimization step with varying $\sigma_8$. The top panels show the projected halo field; bottom panels show residuals relative to the target. The reported $d \mathcal{L}/d\sigma_8$ values trace the sign and magnitude of the gradient as the optimizer converges toward the simulated true value ($\sigma_8 = 0.8$).
  • Figure 5: Comparison of the derivative of the halo power spectrum with respect to $\sigma_8$, computed via jFoF autodiff (black) and finite-difference estimation (red). The curves show ensemble averages over 100 realizations. The strong agreement and reduced noise in the autodiff results confirm that frozen-assignment differentiation provides stable and physically meaningful gradients.
  • ...and 5 more figures