Table of Contents
Fetching ...

Deterministic Retrieval at Scale: Optimal-Space LCP Indexing and 308x Energy Reduction on Modern GPUs

Stanislav Byriukov

TL;DR

This work tackles deterministic top-$k$ retrieval under Longest Common Prefix (LCP) similarity for datasets of $N$ sequences length $L$, proving a space lower bound and delivering a trie-based index that uses $O(NL)$ space with $O(L+k)$ query time. It introduces Thermal-Aware Logic (TAL) to transform prefix structures into energy-efficient, range-bounded scans, achieving $308\times$ energy reduction and $329\times$ latency improvements on a 20M-item benchmark on NVIDIA GPUs while maintaining near-peak utilization. The authors also establish a determinism guarantee for all executions and provide extensive hardware validation, showing that the LCP-Index both preserves exact results and operates efficiently enough for safety-critical, edge, and distributed systems. The combination of provable optimality, deterministic retrieval, and substantial energy savings has direct implications for certification-compliant memory retrieval in aerospace, automotive, and multi-agent robotics contexts. These results advance a practical deterministic primitive for scalable similarity retrieval where approximate methods are unacceptable. All math is presented with $...$ delimiters where used, and the work emphasizes a path toward reproducible, certifiable high-performance retrieval primitives.

Abstract

We study deterministic top-k retrieval under Longest Common Prefix (LCP) similarity for N sequences of length L. We prove a tight Omega(N) space lower bound (cell-probe model) and present a trie-based index using O(N*L) space with O(L+k) query time. We contrast this with pairwise materialization (Theta(N^2)), which hits a practical OOM wall at scale, while our indexed approach remains O(N) in memory. We then introduce Thermal-Aware Logic (TAL), which turns prefix structure into range-bounded scans. In hardware measurements, TAL reduces energy per query by 308x (0.0145 J vs 4.46 J) and cuts p95 latency by 329x (0.114 ms vs 37.5 ms) on a 20M-item range-scan benchmark, while sustaining near-peak utilization (~99%) under long runs. The result is a deterministic retrieval primitive with receipts in regimes where approximate methods are unacceptable.

Deterministic Retrieval at Scale: Optimal-Space LCP Indexing and 308x Energy Reduction on Modern GPUs

TL;DR

This work tackles deterministic top- retrieval under Longest Common Prefix (LCP) similarity for datasets of sequences length , proving a space lower bound and delivering a trie-based index that uses space with query time. It introduces Thermal-Aware Logic (TAL) to transform prefix structures into energy-efficient, range-bounded scans, achieving energy reduction and latency improvements on a 20M-item benchmark on NVIDIA GPUs while maintaining near-peak utilization. The authors also establish a determinism guarantee for all executions and provide extensive hardware validation, showing that the LCP-Index both preserves exact results and operates efficiently enough for safety-critical, edge, and distributed systems. The combination of provable optimality, deterministic retrieval, and substantial energy savings has direct implications for certification-compliant memory retrieval in aerospace, automotive, and multi-agent robotics contexts. These results advance a practical deterministic primitive for scalable similarity retrieval where approximate methods are unacceptable. All math is presented with delimiters where used, and the work emphasizes a path toward reproducible, certifiable high-performance retrieval primitives.

Abstract

We study deterministic top-k retrieval under Longest Common Prefix (LCP) similarity for N sequences of length L. We prove a tight Omega(N) space lower bound (cell-probe model) and present a trie-based index using O(N*L) space with O(L+k) query time. We contrast this with pairwise materialization (Theta(N^2)), which hits a practical OOM wall at scale, while our indexed approach remains O(N) in memory. We then introduce Thermal-Aware Logic (TAL), which turns prefix structure into range-bounded scans. In hardware measurements, TAL reduces energy per query by 308x (0.0145 J vs 4.46 J) and cuts p95 latency by 329x (0.114 ms vs 37.5 ms) on a 20M-item range-scan benchmark, while sustaining near-peak utilization (~99%) under long runs. The result is a deterministic retrieval primitive with receipts in regimes where approximate methods are unacceptable.
Paper Structure (79 sections, 16 theorems, 21 equations, 2 figures, 11 tables, 6 algorithms)

This paper contains 79 sections, 16 theorems, 21 equations, 2 figures, 11 tables, 6 algorithms.

Key Result

Theorem 1

The distance $d$ satisfies the strong triangle inequality: for all $s, t, u \in \Sigma^L$.

Figures (2)

  • Figure 1: Memory scaling: materialization vs LCP-Index. The dashed line shows H100 memory limit.
  • Figure 2: TAL energy reduction scales with bucket count, matching theoretical $O(1/B)$.

Theorems & Definitions (40)

  • Example 1: GNC Sensor Fusion
  • Definition 1: LCP Similarity
  • Definition 2: Top-$k$ LCP Retrieval
  • Definition 3: LCP-Induced Ultrametric
  • Theorem 1: Ultrametric Property
  • proof
  • Definition 4: Cell-Probe Complexity
  • Theorem 2: Space Lower Bound
  • proof
  • Theorem 3: Query Lower Bound
  • ...and 30 more