Table of Contents
Fetching ...

The influence of the random numbers quality on the results in stochastic simulations and machine learning

Benjamin A. Antunes

TL;DR

The paper addresses whether PRNG statistical quality influences outcomes in stochastic simulations and ML. It conducts a cross-domain, controlled study by assessing seven PRNGs (three escalating-quality LCGs, MT, PCG, Philox, and a C rand) across four workloads: a large-scale ABM, two MNIST implementations, and CartPole RL, with 30 repeats per generator and fixed seeds. The results show that extremely poor PRNGs can cause substantial biases in ABM dynamics, MNIST accuracy, and RL performance, while mid- and good-quality generators generally align with top-tier PRNGs, except in sensitive tasks like CartPole where quality matters more. The findings suggest practitioners should ensure a robust statistical quality threshold for PRNGs, focusing on performance constraints and implementation considerations once that threshold is met, to avoid systematic errors in stochastic computations.

Abstract

Pseudorandom number generators (PRNGs) are ubiquitous in stochastic simulations and machine learning (ML), where they drive sampling, parameter initialization, regularization, and data shuffling. While widely used, the potential impact of PRNG statistical quality on computational results remains underexplored. In this study, we investigate whether differences in PRNG quality, as measured by standard statistical test suites, can influence outcomes in representative stochastic applications. Seven PRNGs were evaluated, ranging from low-quality linear congruential generators (LCGs) with known statistical deficiencies to high-quality generators such as Mersenne Twister, PCG, and Philox. We applied these PRNGs to four distinct tasks: an epidemiological agent-based model (ABM), two independent from-scratch MNIST classification implementations (Python/NumPy and C++), and a reinforcement learning (RL) CartPole environment. Each experiment was repeated 30 times per generator using fixed seeds to ensure reproducibility, and outputs were compared using appropriate statistical analyses. Results show that very poor statistical quality, as in the ''bad'' LCG failing 125 TestU01 Crush tests, produces significant deviations in ABM epidemic dynamics, reduces MNIST classification accuracy, and severely degrades RL performance. In contrast, mid-and good-quality LCGs-despite failing a limited number of Crush or BigCrush tests-performed comparably to top-tier PRNGs in most tasks, with the RL experiment being the primary exception where performance scaled with statistical quality. Our findings indicate that, once a generator meets a sufficient statistical robustness threshold, its family or design has negligible impact on outcomes for most workloads, allowing selection to be guided by performance and implementation considerations. However, the use of low-quality PRNGs in sensitive stochastic computations can introduce substantial and systematic errors.

The influence of the random numbers quality on the results in stochastic simulations and machine learning

TL;DR

The paper addresses whether PRNG statistical quality influences outcomes in stochastic simulations and ML. It conducts a cross-domain, controlled study by assessing seven PRNGs (three escalating-quality LCGs, MT, PCG, Philox, and a C rand) across four workloads: a large-scale ABM, two MNIST implementations, and CartPole RL, with 30 repeats per generator and fixed seeds. The results show that extremely poor PRNGs can cause substantial biases in ABM dynamics, MNIST accuracy, and RL performance, while mid- and good-quality generators generally align with top-tier PRNGs, except in sensitive tasks like CartPole where quality matters more. The findings suggest practitioners should ensure a robust statistical quality threshold for PRNGs, focusing on performance constraints and implementation considerations once that threshold is met, to avoid systematic errors in stochastic computations.

Abstract

Pseudorandom number generators (PRNGs) are ubiquitous in stochastic simulations and machine learning (ML), where they drive sampling, parameter initialization, regularization, and data shuffling. While widely used, the potential impact of PRNG statistical quality on computational results remains underexplored. In this study, we investigate whether differences in PRNG quality, as measured by standard statistical test suites, can influence outcomes in representative stochastic applications. Seven PRNGs were evaluated, ranging from low-quality linear congruential generators (LCGs) with known statistical deficiencies to high-quality generators such as Mersenne Twister, PCG, and Philox. We applied these PRNGs to four distinct tasks: an epidemiological agent-based model (ABM), two independent from-scratch MNIST classification implementations (Python/NumPy and C++), and a reinforcement learning (RL) CartPole environment. Each experiment was repeated 30 times per generator using fixed seeds to ensure reproducibility, and outputs were compared using appropriate statistical analyses. Results show that very poor statistical quality, as in the ''bad'' LCG failing 125 TestU01 Crush tests, produces significant deviations in ABM epidemic dynamics, reduces MNIST classification accuracy, and severely degrades RL performance. In contrast, mid-and good-quality LCGs-despite failing a limited number of Crush or BigCrush tests-performed comparably to top-tier PRNGs in most tasks, with the RL experiment being the primary exception where performance scaled with statistical quality. Our findings indicate that, once a generator meets a sufficient statistical robustness threshold, its family or design has negligible impact on outcomes for most workloads, allowing selection to be guided by performance and implementation considerations. However, the use of low-quality PRNGs in sensitive stochastic computations can introduce substantial and systematic errors.

Paper Structure

This paper contains 6 sections, 8 figures.

Figures (8)

  • Figure 1: Mean epidemic curves for each PRNG. The poor-quality LCG shows a visible displacement in amplitude and timing.
  • Figure 2: Individual epidemic curves for each PRNG across $30$ replicates.
  • Figure 3: Violin plots for MNIST (NumPy) classification accuracy by PRNG.
  • Figure 4: Box plots for MNIST (NumPy) classification accuracy by PRNG.
  • Figure 5: Violin plots for MNIST (C++) classification accuracy by PRNG.
  • ...and 3 more figures