Table of Contents
Fetching ...

GoldbachGPU: An Open Source GPU-Accelerated Framework for Verification of Goldbach's Conjecture

Isaac Llorente-Saguer

TL;DR

A dense bit-packed prime representation provides a 16x reduction in memory footprint, and a segmented double-sieve design removes the VRAM ceiling entirely, which results in exhaustive verification up to 10^12 on a single NVIDIA RTX 3070 (8 GB VRAM), with no counterexamples found.

Abstract

We present GoldbachGPU, an open-source framework for large-scale computational verification of Goldbach's conjecture using commodity GPU hardware. Prior GPU-based approaches reported a hard memory ceiling near 10^11 due to monolithic prime-table allocation. We show that this limitation is architectural rather than fundamental: a dense bit-packed prime representation provides a 16x reduction in memory footprint, and a segmented double-sieve design removes the VRAM ceiling entirely. By inverting the verification loop and combining a GPU fast-path with a multi-phase primality oracle, the framework achieves exhaustive verification up to 10^12 on a single NVIDIA RTX 3070 (8 GB VRAM), with no counterexamples found. Each segment requires 14 MB of VRAM, yielding O(N) wall-clock time and O(1) memory in N. A rigorous CPU fallback guarantees mathematical completeness, though it was never invoked in practice. An arbitrary-precision checker using GMP and OpenMP extends single-number verification to 10^10000 via a synchronised batch-search strategy. The segmented architecture also exhibits clean multi-GPU scaling on data-centre hardware (tested on 8 x H100). All code is open-source, documented, and reproducible on both commodity and high-end hardware.

GoldbachGPU: An Open Source GPU-Accelerated Framework for Verification of Goldbach's Conjecture

TL;DR

A dense bit-packed prime representation provides a 16x reduction in memory footprint, and a segmented double-sieve design removes the VRAM ceiling entirely, which results in exhaustive verification up to 10^12 on a single NVIDIA RTX 3070 (8 GB VRAM), with no counterexamples found.

Abstract

We present GoldbachGPU, an open-source framework for large-scale computational verification of Goldbach's conjecture using commodity GPU hardware. Prior GPU-based approaches reported a hard memory ceiling near 10^11 due to monolithic prime-table allocation. We show that this limitation is architectural rather than fundamental: a dense bit-packed prime representation provides a 16x reduction in memory footprint, and a segmented double-sieve design removes the VRAM ceiling entirely. By inverting the verification loop and combining a GPU fast-path with a multi-phase primality oracle, the framework achieves exhaustive verification up to 10^12 on a single NVIDIA RTX 3070 (8 GB VRAM), with no counterexamples found. Each segment requires 14 MB of VRAM, yielding O(N) wall-clock time and O(1) memory in N. A rigorous CPU fallback guarantees mathematical completeness, though it was never invoked in practice. An arbitrary-precision checker using GMP and OpenMP extends single-number verification to 10^10000 via a synchronised batch-search strategy. The segmented architecture also exhibits clean multi-GPU scaling on data-centre hardware (tested on 8 x H100). All code is open-source, documented, and reproducible on both commodity and high-end hardware.
Paper Structure (29 sections, 2 equations, 2 figures, 7 tables)

This paper contains 29 sections, 2 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Primality-oracle selection in goldbach_gpu3. For each even integer $n$ in the current segment $[A,B]$, the verifier iterates over candidate primes $p$ and computes $q = n - p$. The primality of $q$ is resolved by the first applicable oracle in a strict priority chain designed to minimize GPU compute cost: (1) the resident 122 KB small-primes bitset when $q \le P_{\text{SMALL}}$ (where $P_{\text{SMALL}}$ is the upper bound of the permanently resident small-primes table); (2) the 14 MB segment bitset, pre-sieved on the CPU, when $q \in [A,B]$; or (3) a deterministic 64-bit Miller--Rabin test for the transitional region $P_{\text{SMALL}} < q < A$. This three-way structure guarantees complete coverage of all possible $q$ values without requiring a monolithic prime table to reside in VRAM.
  • Figure 2: Log--log runtime scaling. CPU baseline (black dashed), goldbach_gpu2 (blue, global bitset, limited to $10^{10}$ in practice), and goldbach_gpu3 (red, segmented, no VRAM ceiling). All three scale approximately linearly. goldbach_gpu3 is slower at shared limits due to per-segment overhead, but is the only tool capable of reaching $10^{12}$.