Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

Xiaojie Gu; Dmitry Ignatov; Radu Timofte

Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

Xiaojie Gu, Dmitry Ignatov, Radu Timofte

Abstract

Neural Architecture Search (NAS) automates network design, but conventional methods demand substantial computational resources. We propose a closed-loop pipeline leveraging large language models (LLMs) to iteratively generate, evaluate, and refine convolutional neural network architectures for image classification on a single consumer-grade GPU without LLM fine-tuning. Central to our approach is a historical feedback memory inspired by Markov chains: a sliding window of $K{=}5$ recent improvement attempts keeps context size constant while providing sufficient signal for iterative learning. Unlike prior LLM optimizers that discard failure trajectories, each history entry is a structured diagnostic triple -- recording the identified problem, suggested modification, and resulting outcome -- treating code execution failures as first-class learning signals. A dual-LLM specialization reduces per-call cognitive load: a Code Generator produces executable PyTorch architectures while a Prompt Improver handles diagnostic reasoning. Since both the LLM and architecture training share limited VRAM, the search implicitly favors compact, hardware-efficient models suited to edge deployment. We evaluate three frozen instruction-tuned LLMs (${\leq}7$B parameters) across up to 2000 iterations in an unconstrained open code space, using one-epoch proxy accuracy on CIFAR-10, CIFAR-100, and ImageNette as a fast ranking signal. On CIFAR-10, DeepSeek-Coder-6.7B improves from 28.2% to 69.2%, Qwen2.5-7B from 50.0% to 71.5%, and GLM-5 from 43.2% to 62.0%. A full 2000-iteration search completes in ${\approx}18$ GPU hours on a single RTX~4090, establishing a low-budget, reproducible, and hardware-aware paradigm for LLM-driven NAS without cloud infrastructure.

Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

Abstract

recent improvement attempts keeps context size constant while providing sufficient signal for iterative learning. Unlike prior LLM optimizers that discard failure trajectories, each history entry is a structured diagnostic triple -- recording the identified problem, suggested modification, and resulting outcome -- treating code execution failures as first-class learning signals. A dual-LLM specialization reduces per-call cognitive load: a Code Generator produces executable PyTorch architectures while a Prompt Improver handles diagnostic reasoning. Since both the LLM and architecture training share limited VRAM, the search implicitly favors compact, hardware-efficient models suited to edge deployment. We evaluate three frozen instruction-tuned LLMs (

B parameters) across up to 2000 iterations in an unconstrained open code space, using one-epoch proxy accuracy on CIFAR-10, CIFAR-100, and ImageNette as a fast ranking signal. On CIFAR-10, DeepSeek-Coder-6.7B improves from 28.2% to 69.2%, Qwen2.5-7B from 50.0% to 71.5%, and GLM-5 from 43.2% to 62.0%. A full 2000-iteration search completes in

GPU hours on a single RTX~4090, establishing a low-budget, reproducible, and hardware-aware paradigm for LLM-driven NAS without cloud infrastructure.

Paper Structure (32 sections, 3 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 3 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Neural Architecture Search.
LLMs for Code Generation and AutoML.
Iterative LLM Optimization.
Positioning relative to prior work.
Methodology
Pipeline Overview
Code Generator
Evaluator
Prompt Improver with Historical Feedback Memory
Markov Property Formalization.
Experiments
Experimental Setup
Dataset.
...and 17 more sections

Figures (3)

Figure 1: Overview of the iterative NAS pipeline. The Code Generator produces a candidate architecture as executable PyTorch code. The Evaluator validates and trains it using one-epoch proxy evaluation. The Prompt Improver analyzes results with historical feedback memory to generate targeted improvement suggestions for the next iteration.
Figure 2: One-epoch proxy accuracy on CIFAR-10 (top row, a--c), CIFAR-100 (middle row, d--f), and ImageNette (bottom row, g--i) across all iterations. Light curves show per-iteration accuracy (the accuracy of iterations with errors fall back to previous value), dashed lines show the smoothed trend (window $w{=}15$), and bold lines show the best-so-far trajectory. All models exhibit clear upward trends. For DeepSeek-Coder on ImageNette, only the first 30 iterations are plotted because all subsequent iterations resulted in errors.
Figure 3: Ablation study of DeepSeek-Coder-6.7B-Instruct on CIFAR-10, CIFAR-100, and ImageNette datasets. The results highlight the effectiveness of the complete iterative loop with historical feedback memory compared to its ablated variants.

Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

Abstract

Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

Authors

Abstract

Table of Contents

Figures (3)