LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization

Muhammad U. Nasir; Sam Earle; Christopher Cleghorn; Steven James; Julian Togelius

LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization

Muhammad U. Nasir, Sam Earle, Christopher Cleghorn, Steven James, Julian Togelius

TL;DR

LLMatic introduces a novel NAS framework that fuses code-generating LLMs with quality-diversity optimization to search architecture space efficiently. By maintaining two complementary archives (network and prompt) and employing mutation/crossover guided by fitness and curiosity, it discovers diverse, high-performing networks using only 2,000 evaluations. Empirical results on CIFAR-10 and NAS-bench-201 show competitive accuracy close to state-of-the-art, outperforming a GPT-4-based baseline and achieving near-optimal NAS-bench-201 performance with a 6.1B parameter CodeGen model. The approach emphasizes diversity and resource efficiency, suggesting scalable improvements with larger LLMs and broader benchmarks.

Abstract

Large Language Models (LLMs) have emerged as powerful tools capable of accomplishing a broad spectrum of tasks. Their abilities span numerous areas, and one area where they have made a significant impact is in the domain of code generation. Here, we propose using the coding abilities of LLMs to introduce meaningful variations to code defining neural networks. Meanwhile, Quality-Diversity (QD) algorithms are known to discover diverse and robust solutions. By merging the code-generating abilities of LLMs with the diversity and robustness of QD solutions, we introduce \texttt{LLMatic}, a Neural Architecture Search (NAS) algorithm. While LLMs struggle to conduct NAS directly through prompts, \texttt{LLMatic} uses a procedural approach, leveraging QD for prompts and network architecture to create diverse and high-performing networks. We test \texttt{LLMatic} on the CIFAR-10 and NAS-bench-201 benchmarks, demonstrating that it can produce competitive networks while evaluating just $2,000$ candidates, even without prior knowledge of the benchmark domain or exposure to any previous top-performing models for the benchmark. The open-sourced code is available in \url{https://github.com/umair-nasir14/LLMatic}.

LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization

TL;DR

Abstract

candidates, even without prior knowledge of the benchmark domain or exposure to any previous top-performing models for the benchmark. The open-sourced code is available in \url{https://github.com/umair-nasir14/LLMatic}.

Paper Structure (11 sections, 5 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Approach
Evaluating LLMatic
Setting up LLMatic
Ablation Study
Ablation Results and Discussion
Experiments on NAS-bench-201
Results
Conclusion and Future Work
Acknowledgments

Figures (5)

Figure 1: Illustrated in the figure is the flow of LLMatic. In the initial round of evolution, an initial network with a random prompt goes through a mutation operation. Network individual and prompt individual are then evaluated to be stored in separate archives. During the evolutionary loop, the selected prompt and network go through an evolutionary operation (the prompt is fixed if the operation is crossover) to create more networks and prompt individuals to fill and illuminate the archives.
Figure 2: The illustration of the best accuracy per generation for LLMatic and all ablation studies. Each experiment is conducted with 30 seeds. The shaded region is the standard deviation while the solid line represents the mean. EfficientNet-B0 is the best-performing EfficientNet on CIFAR-10.
Figure 3: An illustration of archives generated by LLMatic. We have selected the archive with the median number of cells filled in experiments over 30 seeds. \ref{['parch']} shows the prompt archive, while \ref{['narch']} shows the network archive. The lighter the colour of the filled cell, the better fitness of the individual. White indicates that the cell is empty.
Figure 4: The illustration of how many trainable networks are created in a generation. The total number of networks created is 100 per generation. This illustration is calculated over 10 runs. The shaded region is the standard deviation.
Figure 5: Illustration of test accuracies of all networks across all datasets and best-found networks in each generation by LLMatic.

LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization

TL;DR

Abstract

LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)