LLM-Guided Search for Deletion-Correcting Codes
Franziska Weindel, Reinhard Heckel
TL;DR
This work addresses the challenge of constructing deletion-correcting codes with maximum size by reframing the problem as greedy independent-set construction in a subsequence-overlap graph and solving it with an LLM-guided evolutionary search (FunSearch). The authors adapt FunSearch with a deduplication step, distributed execution, and prompt-driven LLMs to learn high-quality priority functions that yield large deletion-correcting codes; their results include optimal or near-optimal single-deletion constructions up to length $n=25$ and new lower bounds for two deletions at several lengths, alongside insights into the underlying logic of the discovered priorities. They demonstrate that priority-function-based approaches can generalize across code lengths and deletions, while also showing the practical benefits and limitations of LLMs in combinatorial code design, and providing an open-source implementation to enable future research. Overall, the work highlights the potential of LLM-guided search for information theory and code design, offering a scalable framework and concrete new bounds, albeit with scalability constraints for long codes due to evaluator costs.
Abstract
Finding deletion-correcting codes of maximum size has been an open problem for over 70 years, even for a single deletion. In this paper, we propose a novel approach for constructing deletion-correcting codes. A code is a set of sequences satisfying certain constraints, and we construct it by greedily adding the highest-priority sequence according to a priority function. To find good priority functions, we leverage FunSearch, a large language model (LLM)-guided evolutionary search proposed by Romera et al., 2024. FunSearch iteratively generates, evaluates, and refines priority functions to construct large deletion-correcting codes. For a single deletion, our evolutionary search finds functions that construct codes which match known maximum sizes, reach the size of the largest (conjectured optimal) Varshamov-Tenengolts codes where the maximum is unknown, and independently rediscover them in equivalent form. For two deletions, we find functions that construct codes with new best-known sizes for code lengths \( n = 12, 13 \), and \( 16 \), establishing improved lower bounds. These results demonstrate the potential of LLM-guided search for information theory and code design and represent the first application of such methods for constructing error-correcting codes.
