Table of Contents
Fetching ...

LLM-Guided Search for Deletion-Correcting Codes

Franziska Weindel, Reinhard Heckel

TL;DR

This work addresses the challenge of constructing deletion-correcting codes with maximum size by reframing the problem as greedy independent-set construction in a subsequence-overlap graph and solving it with an LLM-guided evolutionary search (FunSearch). The authors adapt FunSearch with a deduplication step, distributed execution, and prompt-driven LLMs to learn high-quality priority functions that yield large deletion-correcting codes; their results include optimal or near-optimal single-deletion constructions up to length $n=25$ and new lower bounds for two deletions at several lengths, alongside insights into the underlying logic of the discovered priorities. They demonstrate that priority-function-based approaches can generalize across code lengths and deletions, while also showing the practical benefits and limitations of LLMs in combinatorial code design, and providing an open-source implementation to enable future research. Overall, the work highlights the potential of LLM-guided search for information theory and code design, offering a scalable framework and concrete new bounds, albeit with scalability constraints for long codes due to evaluator costs.

Abstract

Finding deletion-correcting codes of maximum size has been an open problem for over 70 years, even for a single deletion. In this paper, we propose a novel approach for constructing deletion-correcting codes. A code is a set of sequences satisfying certain constraints, and we construct it by greedily adding the highest-priority sequence according to a priority function. To find good priority functions, we leverage FunSearch, a large language model (LLM)-guided evolutionary search proposed by Romera et al., 2024. FunSearch iteratively generates, evaluates, and refines priority functions to construct large deletion-correcting codes. For a single deletion, our evolutionary search finds functions that construct codes which match known maximum sizes, reach the size of the largest (conjectured optimal) Varshamov-Tenengolts codes where the maximum is unknown, and independently rediscover them in equivalent form. For two deletions, we find functions that construct codes with new best-known sizes for code lengths \( n = 12, 13 \), and \( 16 \), establishing improved lower bounds. These results demonstrate the potential of LLM-guided search for information theory and code design and represent the first application of such methods for constructing error-correcting codes.

LLM-Guided Search for Deletion-Correcting Codes

TL;DR

This work addresses the challenge of constructing deletion-correcting codes with maximum size by reframing the problem as greedy independent-set construction in a subsequence-overlap graph and solving it with an LLM-guided evolutionary search (FunSearch). The authors adapt FunSearch with a deduplication step, distributed execution, and prompt-driven LLMs to learn high-quality priority functions that yield large deletion-correcting codes; their results include optimal or near-optimal single-deletion constructions up to length and new lower bounds for two deletions at several lengths, alongside insights into the underlying logic of the discovered priorities. They demonstrate that priority-function-based approaches can generalize across code lengths and deletions, while also showing the practical benefits and limitations of LLMs in combinatorial code design, and providing an open-source implementation to enable future research. Overall, the work highlights the potential of LLM-guided search for information theory and code design, offering a scalable framework and concrete new bounds, albeit with scalability constraints for long codes due to evaluator costs.

Abstract

Finding deletion-correcting codes of maximum size has been an open problem for over 70 years, even for a single deletion. In this paper, we propose a novel approach for constructing deletion-correcting codes. A code is a set of sequences satisfying certain constraints, and we construct it by greedily adding the highest-priority sequence according to a priority function. To find good priority functions, we leverage FunSearch, a large language model (LLM)-guided evolutionary search proposed by Romera et al., 2024. FunSearch iteratively generates, evaluates, and refines priority functions to construct large deletion-correcting codes. For a single deletion, our evolutionary search finds functions that construct codes which match known maximum sizes, reach the size of the largest (conjectured optimal) Varshamov-Tenengolts codes where the maximum is unknown, and independently rediscover them in equivalent form. For two deletions, we find functions that construct codes with new best-known sizes for code lengths , and , establishing improved lower bounds. These results demonstrate the potential of LLM-guided search for information theory and code design and represent the first application of such methods for constructing error-correcting codes.

Paper Structure

This paper contains 27 sections, 15 equations, 34 figures, 5 tables.

Figures (34)

  • Figure 1: FunSearch for finding deletion-correcting codes iteratively refines a priority function through evolutionary search guided by a pretrained LLM. In each iteration, a few-shot prompt is constructed by sampling from the program database. The LLM generates a new priority function, which is evaluated by greedily constructing deletion-correcting codes for different code lengths with a fixed or variable number of deletions. If executable and not a duplicate, the function is added to the database.
  • Figure 2: Baseline prompt.
  • Figure 3: Graph-based priority function that constructs codes with zero sequence overlap with the largest $\mathrm{VT}_0(n)$ codes for lengths $n = 7, 9, 11, 13$ while achieving the same code size.
  • Figure 4: Number-theoretic priority function that constructs the same codes as the largest $\mathrm{VT}_0(n)$ codes for lengths $n \in [6,11]$, but follows a different logic.
  • Figure 5: Sequence overlap between discovered optimal priority functions and the largest $\mathrm{VT}_0(n)$ codes for $n \in [6,16]$. Color denotes overlap bin; bar height the number of functions.
  • ...and 29 more figures

Theorems & Definitions (1)

  • Claim 1