Evolutionary Generation of Random Surreal Numbers for Benchmarking
Matthew Roughan
TL;DR
This work introduces an evolutionary synthesis method to generate random ensembles of surreal numbers with controlled complexity, enabling robust benchmarking of recursive algorithms and network-like data. By treating surreal numbers as DAGs and using a clade-based evolution with a Poisson-number of parents and a generation-weighting scheme, the authors derive a new two-parameter distribution Su$(\lambda, \alpha)$ governing generation, with a closed-form CDF $P(g(x)\le k)=e^{-\lambda \alpha^{k}}$ and PMF $P(g(x)=0)=e^{-\lambda}$, $P(g(x)=k)=e^{-\lambda \alpha^{k}} - e^{-\lambda \alpha^{k-1}}$ for $k\ge1$. Empirical results show convergence of generation and graph statistics, delineate the final ensemble’s structure (roughly geometric tails, quadratic growth of nodes with generation, linear nodes-to-edges relation), and reveal how the split-point distribution shapes integer prevalence via Simon’s Extra Option Theorem. The approach yields a practical benchmark data generator for surreal-number computations and broader DAG-like data, with open-source code and clear avenues for extending the synthesis to other constrained networks.
Abstract
There are many areas of scientific endeavour where large, complex datasets are needed for benchmarking. Evolutionary computing provides a means towards creating such sets. As a case study, we consider Conway's Surreal numbers. They have largely been treated as a theoretical construct, with little effort towards empirical study, at least in part because of the difficulty of working with all but the smallest numbers. To advance this status, we need efficient algorithms, and in order to develop such we need benchmark data sets of surreal numbers. In this paper, we present a method for generating ensembles of random surreal numbers to benchmark algorithms. The approach uses an evolutionary algorithm to create the benchmark datasets where we can analyse and control features of the resulting test sets. Ultimately, the process is designed to generate networks with defined properties, and we expect this to be useful for other types of network data.
