Transformer models as an efficient replacement for statistical test suites to evaluate the quality of random numbers

Rishabh Goel; YiZi Xiao; Ramin Ramezani

Transformer models as an efficient replacement for statistical test suites to evaluate the quality of random numbers

Rishabh Goel, YiZi Xiao, Ramin Ramezani

TL;DR

The paper addresses the need for efficient validation of randomness, particularly for QRNG outputs, by replacing the slow, per-test NIST STS with an encoder-only Transformer that predicts multi-label passing probabilities for several STS tests. Through a hyper-parameter search, the authors identify a compact model (1 encoder layer, single attention head, 192 embedding size) with an averaging mechanism that achieves Macro F1 near 0.96 and runs substantially faster than NIST STS and even LSTM baselines. The approach demonstrates that Transformers can parallelize and scale randomness evaluation while maintaining accuracy, offering a practical path toward replacing traditional test suites. The work suggests broad applicability to real-time randomness validation and lays groundwork for extending encoding to the full set of NIST STS tests.

Abstract

Random numbers are incredibly important in a variety of fields, and the need for their validation remains important for safety. A Quantum Random Number Generator (QRNG) can theoretically generate truly random numbers, however their quality still needs to be thoroughly validated. Generally, the task of validating random numbers has been delegated to different statistical tests such as the tests from the NIST Statistical Test Suite (STS), which are often slow and only perform one test at a time. Our work presents a deep learning model utilizing the Transformer architecture that 1) performs multiple NIST STS tests at once, and 2) runs much faster. This model outputs multi-label classification results on passing these statistical tests. We performed a thorough hyper-parameter optimization to converge on the best possible model and as a result, achieved a high degree of accuracy with a Macro F1-score of above 0.96. We also compared this model to a conventional deep learning method (Long Short Term Memory Recurrent Neural Networks) to quantify randomness and showed our model achieved similar performances while being much more efficient and scalable. The high performance and efficiency of this Transformer-based deep learning model showed that it can be a viable replacement for the NIST STS for validating random numbers.

Transformer models as an efficient replacement for statistical test suites to evaluate the quality of random numbers

TL;DR

Abstract

Paper Structure (15 sections, 4 equations, 7 figures, 3 tables)

This paper contains 15 sections, 4 equations, 7 figures, 3 tables.

Introduction
Materials and Methods
Dataset
Model Training and Validation
Handling Varying Input Size
Experimental Procedure
Results
Discussion and Analysis
Comparative Analysis
Encoder Layers
Embedding Size
Number of Attention Heads
Conclusion
Concrete data values
F1 Scores

Figures (7)

Figure 1: Baseline model architecture. The tokenizer includes positional encoding and embeddings. The flatten layer is swapped with the averaging layer when constructing the final model.
Figure 2: The effects on the shape of the input are illustrated. The averaging layer averages along each column to reduce the sequence length dimension to 1. The result is a vector that is the length of the embedding size. This vector is then passed into the fully connected layer as in Figure 1.
Figure 3: Performance of our Transformer based model compared to the more widely used LSTM architecture.
Figure 4: Runtime of the Transformer model versus the NIST STS. Each model and the STS had to run through a test set of 20000 binary sequences for each input size.
Figure 5: Encoder Layers versus Macro F1 scores
...and 2 more figures

Transformer models as an efficient replacement for statistical test suites to evaluate the quality of random numbers

TL;DR

Abstract

Transformer models as an efficient replacement for statistical test suites to evaluate the quality of random numbers

Authors

TL;DR

Abstract

Table of Contents

Figures (7)