Table of Contents
Fetching ...

jaxsgp4: GPU-accelerated mega-constellation propagation with batch parallelism

Charlotte Priestley, Will Handley

Abstract

As the population of anthropogenic space objects transitions from sparse clusters to mega-constellations exceeding 100,000 satellites, traditional orbital propagation techniques face a critical bottleneck. Standard CPU-bound implementations of the Simplified General Perturbations 4 (SGP4) algorithm are less well suited to handle the requisite scale of collision avoidance and Space Situational Awareness (SSA) tasks. This paper introduces \texttt{jaxsgp4}, an open-source high-performance reimplementation of SGP4 utilising the \texttt{JAX} library. \texttt{JAX} has gained traction in the landscape of computational research, offering an easy mechanism for Just-In-Time (JIT) compilation, automatic vectorisation and automatic optimisation of code for CPU, GPU and TPU hardware modalities. By refactoring the algorithm into a pure functional paradigm, we leverage these transformations to execute massively parallel propagations on modern GPUs. We demonstrate that \texttt{jaxsgp4} can propagate the entire Starlink constellation (9,341 satellites) each to 1,000 future time steps in under 4 ms on a single A100 GPU, representing a speedup of $1500\times$ over traditional C++ baselines. Furthermore, we argue that the use of 32-bit precision for SGP4 propagation tasks offers a principled trade-off, sacrificing negligible precision loss for a substantial gain in throughput on hardware accelerators.

jaxsgp4: GPU-accelerated mega-constellation propagation with batch parallelism

Abstract

As the population of anthropogenic space objects transitions from sparse clusters to mega-constellations exceeding 100,000 satellites, traditional orbital propagation techniques face a critical bottleneck. Standard CPU-bound implementations of the Simplified General Perturbations 4 (SGP4) algorithm are less well suited to handle the requisite scale of collision avoidance and Space Situational Awareness (SSA) tasks. This paper introduces \texttt{jaxsgp4}, an open-source high-performance reimplementation of SGP4 utilising the \texttt{JAX} library. \texttt{JAX} has gained traction in the landscape of computational research, offering an easy mechanism for Just-In-Time (JIT) compilation, automatic vectorisation and automatic optimisation of code for CPU, GPU and TPU hardware modalities. By refactoring the algorithm into a pure functional paradigm, we leverage these transformations to execute massively parallel propagations on modern GPUs. We demonstrate that \texttt{jaxsgp4} can propagate the entire Starlink constellation (9,341 satellites) each to 1,000 future time steps in under 4 ms on a single A100 GPU, representing a speedup of over traditional C++ baselines. Furthermore, we argue that the use of 32-bit precision for SGP4 propagation tasks offers a principled trade-off, sacrificing negligible precision loss for a substantial gain in throughput on hardware accelerators.

Paper Structure

This paper contains 16 sections, 3 figures.

Figures (3)

  • Figure 1: Scaling performance comparison between JAX/GPU and C++/CPU implementations of the SGP4 algorithm, for a single satellite propagated to multiple times (left) and multiple satellites propagated to a single time (right). jaxsgp4 exploits the potential of modern GPUs to perform massively-parallel computations, exhibiting a flat scaling regime where increased workload does not increase wall-clock time until hardware saturation (top). The 'break-even' point for each GPU (Speedup=1, denoted by dashed line in bottom panels), at which the benefit of parallel computation for large batch sizes overcomes the initial dispatch overhead, occurs at batch sizes of $\sim$300-500. In the bottom panels, we also specify the maximum speedup achieved by jaxsgp4 over the C++ baseline. This occurs in the linear hardware saturation regime, at which point the GPUs are operating at maximum computational efficiency.
  • Figure 2: GPU-accelerated JAX vs C++ SGP4 performance comparison. Blue indicates JAX is faster, Red indicates C++ is faster. The standard C++ implementation is inherently single-threaded and was run on a standard CPU while the JAX benchmark was run on an NVIDIA A100 GPU. Log-spaced axes show the relative performance of each implementation across different scales.
  • Figure 3: Relative error accumulation over two weeks running jaxsgp4 with 32 bit precision versus the standard SGP4 implementation which uses 64 bit precision. The median and 5th - 95th percentile regions are highlighted in blue. The dashed line depicts a conservative lower bound on physical error growth due to limitations of the SGP4 model itself. jaxsgp4 errors are order 1e-9, 1e-12 for position and velocity respectively (floating point) when the same analysis is performed using 64 bit precision.