Deterministic Retrieval at Scale: Optimal-Space LCP Indexing and 308x Energy Reduction on Modern GPUs

Stanislav Byriukov

Deterministic Retrieval at Scale: Optimal-Space LCP Indexing and 308x Energy Reduction on Modern GPUs

Stanislav Byriukov

TL;DR

This work tackles deterministic top-$k$ retrieval under Longest Common Prefix (LCP) similarity for datasets of $N$ sequences length $L$, proving a space lower bound and delivering a trie-based index that uses $O(NL)$ space with $O(L+k)$ query time. It introduces Thermal-Aware Logic (TAL) to transform prefix structures into energy-efficient, range-bounded scans, achieving $308\times$ energy reduction and $329\times$ latency improvements on a 20M-item benchmark on NVIDIA GPUs while maintaining near-peak utilization. The authors also establish a determinism guarantee for all executions and provide extensive hardware validation, showing that the LCP-Index both preserves exact results and operates efficiently enough for safety-critical, edge, and distributed systems. The combination of provable optimality, deterministic retrieval, and substantial energy savings has direct implications for certification-compliant memory retrieval in aerospace, automotive, and multi-agent robotics contexts. These results advance a practical deterministic primitive for scalable similarity retrieval where approximate methods are unacceptable. All math is presented with $...$ delimiters where used, and the work emphasizes a path toward reproducible, certifiable high-performance retrieval primitives.

Abstract

We study deterministic top-k retrieval under Longest Common Prefix (LCP) similarity for N sequences of length L. We prove a tight Omega(N) space lower bound (cell-probe model) and present a trie-based index using O(N*L) space with O(L+k) query time. We contrast this with pairwise materialization (Theta(N^2)), which hits a practical OOM wall at scale, while our indexed approach remains O(N) in memory. We then introduce Thermal-Aware Logic (TAL), which turns prefix structure into range-bounded scans. In hardware measurements, TAL reduces energy per query by 308x (0.0145 J vs 4.46 J) and cuts p95 latency by 329x (0.114 ms vs 37.5 ms) on a 20M-item range-scan benchmark, while sustaining near-peak utilization (~99%) under long runs. The result is a deterministic retrieval primitive with receipts in regimes where approximate methods are unacceptable.

Deterministic Retrieval at Scale: Optimal-Space LCP Indexing and 308x Energy Reduction on Modern GPUs

TL;DR

This work tackles deterministic top-

retrieval under Longest Common Prefix (LCP) similarity for datasets of

sequences length

, proving a space lower bound and delivering a trie-based index that uses

space with

query time. It introduces Thermal-Aware Logic (TAL) to transform prefix structures into energy-efficient, range-bounded scans, achieving

energy reduction and

latency improvements on a 20M-item benchmark on NVIDIA GPUs while maintaining near-peak utilization. The authors also establish a determinism guarantee for all executions and provide extensive hardware validation, showing that the LCP-Index both preserves exact results and operates efficiently enough for safety-critical, edge, and distributed systems. The combination of provable optimality, deterministic retrieval, and substantial energy savings has direct implications for certification-compliant memory retrieval in aerospace, automotive, and multi-agent robotics contexts. These results advance a practical deterministic primitive for scalable similarity retrieval where approximate methods are unacceptable. All math is presented with

delimiters where used, and the work emphasizes a path toward reproducible, certifiable high-performance retrieval primitives.

Abstract

Paper Structure (79 sections, 16 theorems, 21 equations, 2 figures, 11 tables, 6 algorithms)

This paper contains 79 sections, 16 theorems, 21 equations, 2 figures, 11 tables, 6 algorithms.

Introduction
Motivation
Contributions
Paper Organization
Preliminaries
Notation and Definitions
Computational Model
Related Work
Approximate Nearest Neighbor Search
Locality-Sensitive Hashing (LSH).
Hierarchical Navigable Small World (HNSW).
Product Quantization and IVF.
Exact Nearest Neighbor
k-d Trees and Ball Trees.
Metric Trees.
...and 64 more sections

Key Result

Theorem 1

The distance $d$ satisfies the strong triangle inequality: for all $s, t, u \in \Sigma^L$.

Figures (2)

Figure 1: Memory scaling: materialization vs LCP-Index. The dashed line shows H100 memory limit.
Figure 2: TAL energy reduction scales with bucket count, matching theoretical $O(1/B)$.

Theorems & Definitions (40)

Example 1: GNC Sensor Fusion
Definition 1: LCP Similarity
Definition 2: Top-$k$ LCP Retrieval
Definition 3: LCP-Induced Ultrametric
Theorem 1: Ultrametric Property
proof
Definition 4: Cell-Probe Complexity
Theorem 2: Space Lower Bound
proof
Theorem 3: Query Lower Bound
...and 30 more

Deterministic Retrieval at Scale: Optimal-Space LCP Indexing and 308x Energy Reduction on Modern GPUs

TL;DR

Abstract

Deterministic Retrieval at Scale: Optimal-Space LCP Indexing and 308x Energy Reduction on Modern GPUs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (40)