Table of Contents
Fetching ...

Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms

Niki van Stein, Anna V. Kononova, Lars Kotthoff, Thomas Bäck

TL;DR

The paper tackles the challenge of understanding how Large Language Models guide the automatic design and optimization of algorithms within evolutionary frameworks. It introduces Code Evolution Graphs (CEGs), which fuse Abstract Syntax Tree (AST) features, static code analysis, and graph-based lineage representations to dissect the evolution of code produced by LLM-driven methods such as LLaMEA, LLaMEA-HPO, and EoH across BBO, OBP, and TSP benchmarks. By extracting 20 AST/complexity features and visualizing lineage with PCA and t-SNE, the work reveals that LLMs yield diverse, task-dependent code with complexity-growth trends that are not uniformly beneficial; different LLMs imprint distinct coding fingerprints. The findings suggest that leveraging multiple LLMs can enhance diversity and performance in automated algorithm design (AAD), and they provide a framework for diagnosing and improving LLM-assisted code evolution in practical optimization settings.

Abstract

Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to generate competitive algorithms or the code optimization stalls, and we are left with no recourse because of a lack of understanding of the generation process and generated codes. We present a novel approach to mitigate this problem by enabling users to analyze the generated codes inside the evolutionary process and how they evolve over repeated prompting of the LLM. We show results for three benchmark problem classes and demonstrate novel insights. In particular, LLMs tend to generate more complex code with repeated prompting, but additional complexity can hurt algorithmic performance in some cases. Different LLMs have different coding ``styles'' and generated code tends to be dissimilar to other LLMs. These two findings suggest that using different LLMs inside the code evolution frameworks might produce higher performing code than using only one LLM.

Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms

TL;DR

The paper tackles the challenge of understanding how Large Language Models guide the automatic design and optimization of algorithms within evolutionary frameworks. It introduces Code Evolution Graphs (CEGs), which fuse Abstract Syntax Tree (AST) features, static code analysis, and graph-based lineage representations to dissect the evolution of code produced by LLM-driven methods such as LLaMEA, LLaMEA-HPO, and EoH across BBO, OBP, and TSP benchmarks. By extracting 20 AST/complexity features and visualizing lineage with PCA and t-SNE, the work reveals that LLMs yield diverse, task-dependent code with complexity-growth trends that are not uniformly beneficial; different LLMs imprint distinct coding fingerprints. The findings suggest that leveraging multiple LLMs can enhance diversity and performance in automated algorithm design (AAD), and they provide a framework for diagnosing and improving LLM-assisted code evolution in practical optimization settings.

Abstract

Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to generate competitive algorithms or the code optimization stalls, and we are left with no recourse because of a lack of understanding of the generation process and generated codes. We present a novel approach to mitigate this problem by enabling users to analyze the generated codes inside the evolutionary process and how they evolve over repeated prompting of the LLM. We show results for three benchmark problem classes and demonstrate novel insights. In particular, LLMs tend to generate more complex code with repeated prompting, but additional complexity can hurt algorithmic performance in some cases. Different LLMs have different coding ``styles'' and generated code tends to be dissimilar to other LLMs. These two findings suggest that using different LLMs inside the code evolution frameworks might produce higher performing code than using only one LLM.

Paper Structure

This paper contains 14 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: t-SNE visualisation of the $26$ code features for different LLaMEA configurations, $5$ independent runs per configuration and $5$ Random Search runs. The color denotes the method, different shapes denote different independent runs and different sizes denote different normalized fitness (bigger is better).
  • Figure 2: Code Evolution Graphs for LLaMEA using different LLMs and a baseline Random Search on BBO. On the left side are CEGs using the first PCA component of the AST graph metrics on the $y$-axis, with the number denoting the fraction of the total variance accounted for by this dimension. On the right side are CEGs using the total token count on the $y$-axis. Each row represents a different algorithm configuration (LLM) and each column is one independent run (3 runs in total).
  • Figure 3: Code Evolution Graphs for LLaMEA-HPO and EoH on the Online Bin Packing Problems. On the left side are CEGs using the first PCA component of the AST graph metrics on the $y$-axis, with the number denoting the fraction of the total variance accounted for by this dimension. On the right side are CEGs using the total token count on the $y$-axis. The top row shows different runs of LLaMEA-HPO and the bottom row shows different runs for EoH.
  • Figure 4: Code Evolution Graphs for LLaMEA-HPO and EoH on the Traveling Salesperson Problems. On the left side are CEGs using the first PCA component of the AST graph metrics on the $y$-axis, with the number denoting the fraction of the total variance accounted for by this dimension. On the right side are CEGs using the total token count on the $y$-axis. The top row shows different runs of LLaMEA-HPO and the bottom row shows different runs for EoH.
  • Figure 5: Spearman correlation index for each code feature (column) with the performance of the algorithms (fitness) for all benchmarks and methods (rows).