Table of Contents
Fetching ...

Statistical testing of random number generators and their improvement using randomness extraction

Cameron Foreman, Richie Yeung, Florian J. Curchod

TL;DR

This study develops a tunable statistical testing environment (STE) for rigorous RNG evaluation, benchmarked on three widely used generators (32-bit LFSR, Intel RDSEED, IDQ Quantis) using multiple standard test suites. It introduces a four-tier randomness extraction hierarchy (deterministic, seeded, two-source, and physical device-independent) implemented via the Circulant extractor in Cryptomite, coupled with post-processing guided by min-entropy estimates. Across levels 2–4, post-processing significantly improves statistical properties, with level 4 leveraging semi-device-independent quantum protocols to certify additional entropy; however, some sources remain challenging due to intrinsic min-entropy limitations. The authors provide open-source access to STE and the extraction toolkit, demonstrating practical pathways to robust RNGs beyond standard certification tests and highlighting the limits of statistical testing in guaranteeing cryptographic unpredictability. These results have direct implications for designing and certifying cryptographic RNGs by combining diverse extraction paradigms with comprehensive, repeatable statistical testing.

Abstract

Random number generators (RNGs) are notoriously challenging to build and test, especially for cryptographic applications. While statistical tests cannot definitively guarantee an RNG's output quality, they are a powerful verification tool and the only universally applicable testing method. In this work, we design, implement, and present various post-processing methods, using randomness extractors, to improve the RNG output quality and compare them through statistical testing. We begin by performing intensive tests on three RNGs -- the 32-bit linear feedback shift register (LFSR), Intel's 'RDSEED,' and IDQuantique's 'Quantis' -- and compare their performance. Next, we apply the different post-processing methods to each RNG and conduct further intensive testing on the processed output. To facilitate this, we introduce a comprehensive statistical testing environment, based on existing test suites, that can be parametrised for lightweight (fast) to intensive testing.

Statistical testing of random number generators and their improvement using randomness extraction

TL;DR

This study develops a tunable statistical testing environment (STE) for rigorous RNG evaluation, benchmarked on three widely used generators (32-bit LFSR, Intel RDSEED, IDQ Quantis) using multiple standard test suites. It introduces a four-tier randomness extraction hierarchy (deterministic, seeded, two-source, and physical device-independent) implemented via the Circulant extractor in Cryptomite, coupled with post-processing guided by min-entropy estimates. Across levels 2–4, post-processing significantly improves statistical properties, with level 4 leveraging semi-device-independent quantum protocols to certify additional entropy; however, some sources remain challenging due to intrinsic min-entropy limitations. The authors provide open-source access to STE and the extraction toolkit, demonstrating practical pathways to robust RNGs beyond standard certification tests and highlighting the limits of statistical testing in guaranteeing cryptographic unpredictability. These results have direct implications for designing and certifying cryptographic RNGs by combining diverse extraction paradigms with comprehensive, repeatable statistical testing.

Abstract

Random number generators (RNGs) are notoriously challenging to build and test, especially for cryptographic applications. While statistical tests cannot definitively guarantee an RNG's output quality, they are a powerful verification tool and the only universally applicable testing method. In this work, we design, implement, and present various post-processing methods, using randomness extractors, to improve the RNG output quality and compare them through statistical testing. We begin by performing intensive tests on three RNGs -- the 32-bit linear feedback shift register (LFSR), Intel's 'RDSEED,' and IDQuantique's 'Quantis' -- and compare their performance. Next, we apply the different post-processing methods to each RNG and conduct further intensive testing on the processed output. To facilitate this, we introduce a comprehensive statistical testing environment, based on existing test suites, that can be parametrised for lightweight (fast) to intensive testing.
Paper Structure (44 sections, 19 equations, 8 figures, 33 tables)

This paper contains 44 sections, 19 equations, 8 figures, 33 tables.

Figures (8)

  • Figure S1: This figure illustrates our implementation set-up. The black box represents one of the initial RNGs that we test, and the dashed box denotes the new---in principle, improved---RNG with additional post-processing applied.
  • Figure S2: An illustration of the set-up that we consider. An RNG generates a bit string $X=x$ of length $n$. In this work, we first study the statistical properties of the realisation $x$ of the (random variable) $X$. Then, we analyse the effects of different post-processing methods applied to it.
  • Figure S3: Illustration of the set of sources, or input distributions, that can be successfully extracted from by different randomness extraction methods. (Right) weak input distributions and (Left) second input, or weak seed, distributions. Deterministic extractors (level 1) require additional properties on the weak input but do not need a second input source. Seeded extractors (level 2) relax the need for additional properties of the weak input and extract from sources with min-entropy only, at the cost
  • Figure S4: The above plots show (left) the number of statistical tests failed and (right) failed and suspicious for each initial RNG at each post-processing level. The $x$ axis indicates the level, with step 0 being the initial RNG with no additional post-processing, and steps 1--4 are deterministic, seeded, two-source, and physical extraction, respectively. The $y$ axis is the number of statistical tests failed (left) or failed and suspicious (right), out of 4600, using a logarithmic scale: for $f$ failed or failed and suspicious tests, $y = \log_2(f+1)$. The shaded region in the left plot illustrates the successful region, whereby the RNG fails less than 7.5 tests, and the white region illustrates the 'unacceptable' region, in which, with high probability, near-perfect randomness is not produced. We note that we are unable to use the 32-bit LFSR at level 4 because of its low initial estimated min-entropy rate, $\alpha_{\mathsf{RNG}}$, as detailed and evaluated in \ref{['sec:rng-analysis']}.
  • Figure S5: Here, level 1 of our post-processing methods is performed by using a deterministic extractor, namely the Von Neumann extractor, on the initial output of the RNG.
  • ...and 3 more figures

Theorems & Definitions (10)

  • Definition 1: Min-entropy
  • Definition 2: Block min-entropy
  • Definition 3: Statistical distance
  • Definition 4: $\epsilon$-perfect randomness
  • Definition 5: p-value
  • Definition 6: Deterministic randomness extractor
  • Definition 7: Seeded randomness extractor
  • Definition 8: Strong seeded extractor
  • Definition 9: Two-source randomness extractor
  • Definition 10: Strong two-source extractor