Machine Learning Predictors for Min-Entropy Estimation
Javier Blanco-Romero, Vicente Lorenzo, Florina Almenares Mendoza, Daniel Díaz-Sánchez
TL;DR
This work investigates how machine-learning predictors estimate min-entropy in RNG outputs, focusing on the distinction between average min-entropy and traditional min-entropy under correlated, non-iid data. It develops a theoretical framework based on order-$p$ Markov chains and Generalized Binary Autoregressive Models (gbAR(p)), proving convergence relationships and entropy formulas that link $H_ fty$, $\tilde{H}_\infty$, and their per-bit variants. Through Monte Carlo data generation and experiments with RCNN and GPT-2 predictors, the authors show that ML models tend to estimate the average min-entropy (via modeling conditional probabilities) and can outperform NIST SP 800-90B predictors in certain low-entropy and multi-bit target scenarios, while highlighting the impact of the number of target bits on entropy estimates. The results emphasize the need to consider target-bit counts in entropy assessment for RNGs and suggest that multi-token prediction approaches offer a path to more robust entropy estimation in cryptographic applications, albeit with substantial computational costs in high-entropy regimes.
Abstract
This study investigates the application of machine learning predictors for min-entropy estimation in Random Number Generators (RNGs), a key component in cryptographic applications where accurate entropy assessment is essential for cybersecurity. Our research indicates that these predictors, and indeed any predictor that leverages sequence correlations, primarily estimate average min-entropy, a metric not extensively studied in this context. We explore the relationship between average min-entropy and the traditional min-entropy, focusing on their dependence on the number of target bits being predicted. Utilizing data from Generalized Binary Autoregressive Models, a subset of Markov processes, we demonstrate that machine learning models (including a hybrid of convolutional and recurrent Long Short-Term Memory layers and the transformer-based GPT-2 model) outperform traditional NIST SP 800-90B predictors in certain scenarios. Our findings underscore the importance of considering the number of target bits in min-entropy assessment for RNGs and highlight the potential of machine learning approaches in enhancing entropy estimation techniques for improved cryptographic security.
