Table of Contents
Fetching ...

Neural Prime Sieves: Density-Driven Generalization and Empirical Evidence for Hardy-Littlewood Asymptotics

Manik Kakkar

Abstract

Special prime families (twin, Sophie Germain, safe, cousin, sexy, Chen, and isolated primes) are central objects of analytic number theory, yet no efficiently computable probabilistic filter exists for identifying likely members among known primes at large scale. Classical sieves assign no probability weights to surviving candidates, and prior machine learning approaches are limited by the algorithmic randomness of the prime indicator sequence, yielding near-zero true positive rates. We present PrimeFamilyNet, a multi-head residual network conditioned on the backward prime gap and modular primorial residues of a known prime $p$, learning probabilistic filters for all seven families simultaneously and generalising across nine orders of magnitude from training ($10^7$--$10^9$) to evaluation at $10^{16}$. Isolated prime recall increased monotonically from $0.809$ at $5\times10^8$ to $0.984$ at $10^{16}$, a gain of $17.5$ percentage points and the only family among seven to improve with scale. Because recall is invariant to class prevalence, this reflects genuine decision boundary sharpening, not the rising isolated-prime fraction at extreme scales. A model trained only to $10^9$ reproduced the correct asymptotic direction without density supervision, corroborating Hardy--Littlewood $k$-tuple predictions. The causal model retained over $95\%$ recall for five families near $10^{10}$ while reducing the search space by $62$--$88\%$. For Chen primes, causal recall exceeded non-causal recall at every scale (margin $+0.245$ at $10^{16}$) because $g^+=2$ encodes only the prime case of the Chen condition. Focal Loss collapsed sparse algebraic family recall to $0.000$. Asymmetric Loss outperformed weighted BCE in-distribution but degraded more steeply out-of-distribution, showing that in-distribution recall alone is a misleading criterion for scale-generalisation tasks.

Neural Prime Sieves: Density-Driven Generalization and Empirical Evidence for Hardy-Littlewood Asymptotics

Abstract

Special prime families (twin, Sophie Germain, safe, cousin, sexy, Chen, and isolated primes) are central objects of analytic number theory, yet no efficiently computable probabilistic filter exists for identifying likely members among known primes at large scale. Classical sieves assign no probability weights to surviving candidates, and prior machine learning approaches are limited by the algorithmic randomness of the prime indicator sequence, yielding near-zero true positive rates. We present PrimeFamilyNet, a multi-head residual network conditioned on the backward prime gap and modular primorial residues of a known prime , learning probabilistic filters for all seven families simultaneously and generalising across nine orders of magnitude from training (--) to evaluation at . Isolated prime recall increased monotonically from at to at , a gain of percentage points and the only family among seven to improve with scale. Because recall is invariant to class prevalence, this reflects genuine decision boundary sharpening, not the rising isolated-prime fraction at extreme scales. A model trained only to reproduced the correct asymptotic direction without density supervision, corroborating Hardy--Littlewood -tuple predictions. The causal model retained over recall for five families near while reducing the search space by --. For Chen primes, causal recall exceeded non-causal recall at every scale (margin at ) because encodes only the prime case of the Chen condition. Focal Loss collapsed sparse algebraic family recall to . Asymmetric Loss outperformed weighted BCE in-distribution but degraded more steeply out-of-distribution, showing that in-distribution recall alone is a misleading criterion for scale-generalisation tasks.

Paper Structure

This paper contains 39 sections, 13 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Left: twin prime density fraction (blue) and isolated prime density fraction (orange) across evaluation scales, overlaid with the Hardy--Littlewood prediction $C_2/\!\log N$ (green dashed) calibrated at $5{\times}10^8$. The HL fit achieves $R^2 = 0.981$. Middle: corresponding model recall for twin and isolated primes across scales. The isolated recall curves mirror the density curves at every scale. Right: scatter of density fraction versus recall for twin (blue) and isolated (orange) primes across all five scales, with linear trend lines. The density-recall correlation is quantitatively strong: $R^2 = 0.991$ for isolated primes and $R^2 = 0.984$ for twin primes. Because recall is invariant to class prevalence,sokolova2009measures the correlation reflects the decision boundary sharpening in lockstep with the density shift, not a mechanical effect of the changing class balance.
  • Figure 2: Causal wBCE recall (left) and search-space reduction (right) across five evaluation scales for all seven prime families. Isolated primes (yellow crosses) are the only family whose recall sloped upward, mirroring the rising isolated-prime fraction in Table \ref{['tab:density']}. Safe primes (green triangles) collapsed above $10^{14}$. All other families decayed smoothly. The $90\%$ recall reference (dashed) and $77\%$ sieve baseline (dotted) are shown for context.
  • Figure 3: Recall of the causal model (solid) versus the non-causal upper bound (dashed) across scales. The non-causal model dominated gap-defined families at most scales. For Chen primes, the causal model exceeded non-causal recall at every scale, with the advantage growing to $+0.245$ at $10^{16}$, because $g^{+} = 2$ encodes only the prime case of the Chen condition and carries no information about the semiprime case. Sophie Germain primes showed a marginal causal advantage only at $5{\times}10^8$ and $10^{10}$, consistent with a weak forward-gap correlation that does not persist at extreme scales.
  • Figure 4: Model comparison at $10^{12}$ across recall (left), precision (centre), and search-space reduction (right). XGBoost achieved high recall through over-prediction (low precision) rather than generalised sieve structure. Focal Loss produced zero bars for Sophie Germain and safe primes. The $90\%$ recall reference and $77\%$ sieve baseline are shown.
  • Figure 5: Recall of ASL (dashed) versus wBCE (solid) across scales (left) and Brier score at $10^{12}$ (right). ASL led in-distribution for most families. For Sophie Germain and safe primes, recall under ASL collapsed to $0.023$ and $0.011$ at $10^{16}$, whereas wBCE retained $0.601$ and $0.077$ respectively, illustrating that wBCE is more robust to distribution shift for families governed by linear-transform primality conditions. In-distribution recall is a misleading model-selection criterion for scale-generalisation tasks.
  • ...and 1 more figures