Table of Contents
Fetching ...

On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization

Andriy Miranskyy, Adam Sorrenti, Viral Thakar

TL;DR

This work investigates whether Sobol' quasirandom sequences can improve neural weight initialization relative to conventional pseudorandom initialization. By implementing ten initialization schemes via QRNG-based sampling in Keras/TensorFlow and evaluating MLP, CNN, LSTM, and Transformer models on MNIST, CIFAR-10, and IMDB with SGD and Adam, the authors conduct 240 configuration runs (each repeated 100 times) to compare training speed and final accuracy. The key finding is that QRNG-based initializers achieve higher median accuracy or reach target accuracy more quickly in about 60% of experiments, with top-quantile gains between ~0.0775 and ~0.3550 and a seed-selection overhead of Δ_Q = 4 epochs; nine of ten initializers show benefits, except Random Uniform. These results suggest QRNG-based initialization can speed up training and improve performance in many scenarios, though seed-selection strategies and generalizability require further refinement and validation on larger-scale tasks.

Abstract

The effectiveness of training neural networks directly impacts computational costs, resource allocation, and model development timelines in machine learning applications. An optimizer's ability to train the model adequately (in terms of trained model performance) depends on the model's initial weights. Model weight initialization schemes use pseudorandom number generators (PRNGs) as a source of randomness. We investigate whether substituting PRNGs for low-discrepancy quasirandom number generators (QRNGs) -- namely Sobol' sequences -- as a source of randomness for initializers can improve model performance. We examine Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Transformer architectures trained on MNIST, CIFAR-10, and IMDB datasets using SGD and Adam optimizers. Our analysis uses ten initialization schemes: Glorot, He, Lecun (both Uniform and Normal); Orthogonal, Random Normal, Truncated Normal, and Random Uniform. Models with weights set using PRNG- and QRNG-based initializers are compared pairwise for each combination of dataset, architecture, optimizer, and initialization scheme. Our findings indicate that QRNG-based neural network initializers either reach a higher accuracy or achieve the same accuracy more quickly than PRNG-based initializers in 60% of the 120 experiments conducted. Thus, using QRNG-based initializers instead of PRNG-based initializers can speed up and improve model training.

On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization

TL;DR

This work investigates whether Sobol' quasirandom sequences can improve neural weight initialization relative to conventional pseudorandom initialization. By implementing ten initialization schemes via QRNG-based sampling in Keras/TensorFlow and evaluating MLP, CNN, LSTM, and Transformer models on MNIST, CIFAR-10, and IMDB with SGD and Adam, the authors conduct 240 configuration runs (each repeated 100 times) to compare training speed and final accuracy. The key finding is that QRNG-based initializers achieve higher median accuracy or reach target accuracy more quickly in about 60% of experiments, with top-quantile gains between ~0.0775 and ~0.3550 and a seed-selection overhead of Δ_Q = 4 epochs; nine of ten initializers show benefits, except Random Uniform. These results suggest QRNG-based initialization can speed up training and improve performance in many scenarios, though seed-selection strategies and generalizability require further refinement and validation on larger-scale tasks.

Abstract

The effectiveness of training neural networks directly impacts computational costs, resource allocation, and model development timelines in machine learning applications. An optimizer's ability to train the model adequately (in terms of trained model performance) depends on the model's initial weights. Model weight initialization schemes use pseudorandom number generators (PRNGs) as a source of randomness. We investigate whether substituting PRNGs for low-discrepancy quasirandom number generators (QRNGs) -- namely Sobol' sequences -- as a source of randomness for initializers can improve model performance. We examine Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Transformer architectures trained on MNIST, CIFAR-10, and IMDB datasets using SGD and Adam optimizers. Our analysis uses ten initialization schemes: Glorot, He, Lecun (both Uniform and Normal); Orthogonal, Random Normal, Truncated Normal, and Random Uniform. Models with weights set using PRNG- and QRNG-based initializers are compared pairwise for each combination of dataset, architecture, optimizer, and initialization scheme. Our findings indicate that QRNG-based neural network initializers either reach a higher accuracy or achieve the same accuracy more quickly than PRNG-based initializers in 60% of the 120 experiments conducted. Thus, using QRNG-based initializers instead of PRNG-based initializers can speed up and improve model training.
Paper Structure (78 sections, 25 equations, 30 figures, 12 tables, 1 algorithm)

This paper contains 78 sections, 25 equations, 30 figures, 12 tables, 1 algorithm.

Figures (30)

  • Figure 1: Histogram of the $A_{\mathcal{Q}}^{\max} - A_{\mathcal{P}}^{\max}$ values for three final outcomes.
  • Figure 2: Empirical cumulative distribution function of $|A_{\mathcal{Q}}^{\max} - A_{\mathcal{P}}^{\max}| + 10^{-5}$ for three final outcomes. The $10^{-5}$ term is added to enable rendering of the $x$-axis values on a log scale.
  • Figure 3: A two-dimensional projection of the first 1024 draws of the Keras/TensorFlow pseudorandom number generator (top pane) and the Sobol' sequences (bottom pane). Axis labels denote seed values of the random generator.
  • Figure 4: Sample draws from univariate uniform, normal, and truncated normal distributions.
  • Figure 5: Time needed to draw the $N_{\max} = 10, 100, 1000, 10000$ values from the random normal $\mathcal{N}_{(\cdot)}(0,1; k)$, random uniform $\mathcal{U}_{(\cdot)}(0,1; k)$, and truncated normal $\mathcal{T}_{(\cdot)}(0,1; k)$ distributions. $x$-axis shows seed value $k$, $y$-axis shows execution time measured in seconds. The lines represent median accuracy based on 1000 repetitions, while the ribbons represent the range between lower and upper quartiles.
  • ...and 25 more figures