Algorithms to Uniformly Generate Random Factored Smooth Integers
Eric Bach, Jonathan Sorenson
TL;DR
This work addresses uniform random generation of $y$-smooth integers $\le x$ with full prime factorizations, quantified by $\Psi(x,y)$. It builds a lexicographic enumeration via Buchstab's identity and then samples from the list using a random input $r$, achieving an exact uniform sampler with time $O\big(\Psi(x,y)\log\log y\big)$. To improve practicality, the authors introduce pruning and several speedups under heuristic assumptions, aiming for $O\big(\frac{(\log x)^3}{\log\log x}\big)$ arithmetic operations, and discuss average- vs worst-case analyses, special cases, and extensions to semismooth numbers. An example run demonstrates generating a $10^4$-smooth integer up to $10^{100}$ in under a second, illustrating the method’s potential for fast, factorization-aware sampling in large domains. The results offer a spectrum of exact and heuristic techniques for uniform random generation of smooth integers with controllable tradeoffs between rigor, speed, and precomputation cost.
Abstract
Let $x\ge y>0$ be integers. A positive integer is $y$-smooth if all its prime divisors are at most $y$. Let $Ψ(x,y)$ count the number of $y$-smooth integers up to $x$. We present several algorithms that will generate an integer $n\le x$ at random, with known prime factorization, such that $n$ is $y$-smooth. We begin by describing algorithms to compute $Ψ(x,y)$ exactly and to enumerate $y$-smooth integers up to $x$ in lexicographic order by prime divisor. Both of these are based on Buchstab's identity, and were likely known before. Then we present an algorithm that accepts as input a parameter $r$, $0\le r<1$, and returns the integer $n$ that is at position $\lfloor rΨ(x,y)\rfloor$ in the lexicographic ordering of all $y$-smooth integers up to $x$. Here position 0 is the first position. Thus, $n$ is generated uniformly so long as $r$ is chosen uniformly. This algorithm has a running time of $O(Ψ(x,y)\log\log y)$ arithmetic operations. We then explore the tradeoff between speed and rigor. By relaxing the uniformity of the output and allowing for multiple heuristics in our runtime analysis, we improve the running time to $$ O\left( \frac{ (\log x)^3 }{\log\log x} \right)$$ arithmetic operations. We conclude with a sample run by generating a $10000$-smooth integer $\le 10^{100}$.
