Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms
Niki van Stein, Anna V. Kononova, Lars Kotthoff, Thomas Bäck
TL;DR
The paper tackles the challenge of understanding how Large Language Models guide the automatic design and optimization of algorithms within evolutionary frameworks. It introduces Code Evolution Graphs (CEGs), which fuse Abstract Syntax Tree (AST) features, static code analysis, and graph-based lineage representations to dissect the evolution of code produced by LLM-driven methods such as LLaMEA, LLaMEA-HPO, and EoH across BBO, OBP, and TSP benchmarks. By extracting 20 AST/complexity features and visualizing lineage with PCA and t-SNE, the work reveals that LLMs yield diverse, task-dependent code with complexity-growth trends that are not uniformly beneficial; different LLMs imprint distinct coding fingerprints. The findings suggest that leveraging multiple LLMs can enhance diversity and performance in automated algorithm design (AAD), and they provide a framework for diagnosing and improving LLM-assisted code evolution in practical optimization settings.
Abstract
Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to generate competitive algorithms or the code optimization stalls, and we are left with no recourse because of a lack of understanding of the generation process and generated codes. We present a novel approach to mitigate this problem by enabling users to analyze the generated codes inside the evolutionary process and how they evolve over repeated prompting of the LLM. We show results for three benchmark problem classes and demonstrate novel insights. In particular, LLMs tend to generate more complex code with repeated prompting, but additional complexity can hurt algorithmic performance in some cases. Different LLMs have different coding ``styles'' and generated code tends to be dissimilar to other LLMs. These two findings suggest that using different LLMs inside the code evolution frameworks might produce higher performing code than using only one LLM.
