Table of Contents
Fetching ...

Making the RANMAR pseudorandom number generator in LAMMPS up to four times faster, with an implementation of jump-ahead

Hiroshi Haramoto, Kosuke Suzuki

TL;DR

The paper addresses the need for provably non-overlapping PRNG streams in massively parallel simulations by supplying a mathematically exact jump-ahead for the RANMAR generator used in LAMMPS. It reframes RANMAR in modular arithmetic over $\mathbb{Z}/2^e\mathbb{Z}$ and uses polynomial arithmetic and Cayley–Hamilton theory to compute arbitrary $J$-step jumps efficiently, converting the FP recurrence to an integer one to enable exact state advancement. Implemented in C++ with NTL, the approach yields a $2$–$4\times$ speedup in generation and makes very large jumps (e.g., $J\approx 2^{120}$) practical while preserving identical initial workflows. The work provides a scalable, reproducible solution for parallel RNG streams and lays a framework that can extend to larger recurrences with existing polynomial-arithmetic optimizations.

Abstract

Massively parallel molecular simulations require pseudorandom number streams that are provably non-overlapping and reproducible across thousands of compute units in parallel computing environments. In the widely used LAMMPS package, the standard RANMAR generator lacks a mathematically exact mechanism to jump ahead; distinct seeds are typically assigned instead, which does not ensure disjoint streams. We introduce a mathematically exact jump-ahead extension for RANMAR in LAMMPS. In practice, a single random sequence can be partitioned into consecutive, non-overlapping blocks of length $J$, with one block assigned to each compute unit under formal non-overlap guarantees. In our approach, we develop an algebraic reformulation that enables efficient jump-ahead even for very large $J$ by casting state advancement into polynomial computations over finite residue rings while keeping memory small. We implement the extension in C++ using Number Theory Library (NTL) and integrate it into LAMMPS without altering user workflows. Beyond enabling exact partitioning, converting the 24-bit floating-point recurrence to an equivalent 24-bit integer recurrence accelerates generation itself: across diverse CPUs, generation is approximately two to four times faster than the floating-point baseline. Computing very large jumps (e.g., $J \approx 2^{120}$) remains practical.

Making the RANMAR pseudorandom number generator in LAMMPS up to four times faster, with an implementation of jump-ahead

TL;DR

The paper addresses the need for provably non-overlapping PRNG streams in massively parallel simulations by supplying a mathematically exact jump-ahead for the RANMAR generator used in LAMMPS. It reframes RANMAR in modular arithmetic over and uses polynomial arithmetic and Cayley–Hamilton theory to compute arbitrary -step jumps efficiently, converting the FP recurrence to an integer one to enable exact state advancement. Implemented in C++ with NTL, the approach yields a speedup in generation and makes very large jumps (e.g., ) practical while preserving identical initial workflows. The work provides a scalable, reproducible solution for parallel RNG streams and lays a framework that can extend to larger recurrences with existing polynomial-arithmetic optimizations.

Abstract

Massively parallel molecular simulations require pseudorandom number streams that are provably non-overlapping and reproducible across thousands of compute units in parallel computing environments. In the widely used LAMMPS package, the standard RANMAR generator lacks a mathematically exact mechanism to jump ahead; distinct seeds are typically assigned instead, which does not ensure disjoint streams. We introduce a mathematically exact jump-ahead extension for RANMAR in LAMMPS. In practice, a single random sequence can be partitioned into consecutive, non-overlapping blocks of length , with one block assigned to each compute unit under formal non-overlap guarantees. In our approach, we develop an algebraic reformulation that enables efficient jump-ahead even for very large by casting state advancement into polynomial computations over finite residue rings while keeping memory small. We implement the extension in C++ using Number Theory Library (NTL) and integrate it into LAMMPS without altering user workflows. Beyond enabling exact partitioning, converting the 24-bit floating-point recurrence to an equivalent 24-bit integer recurrence accelerates generation itself: across diverse CPUs, generation is approximately two to four times faster than the floating-point baseline. Computing very large jumps (e.g., ) remains practical.

Paper Structure

This paper contains 7 sections, 26 equations, 2 tables, 3 algorithms.