Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning
Deshani Geethika Poddenige, Sachith Seneviratne, Damith Senanayake, Mahesan Niranjan, PN Suganthan, Saman Halgamuge
TL;DR
Arch-LLM addresses the challenge of generating valid neural architectures by replacing continuous latent spaces with a discrete, VQ-VAE–based latent representation and then fine-tuning a Large Language Model to generate architecture sequences. The approach converts architectures into codebook index sequences and leverages LLMs for sequence modeling, enabling unsupervised neural architecture generation and a NAS algorithm grounded in text-to-text generation. Empirical results on NAS-Bench-101 and NAS-Bench-201 show substantial improvements in validity, uniqueness, and novelty over VAE baselines, with controllable generation behavior via temperature. The work demonstrates a practical, unsupervised pathway to NAS that harnesses cross-domain NLP techniques and highlights both the potential and limitations of discrete latent representations for architectural search.
Abstract
Unsupervised representation learning has been widely explored across various modalities, including neural architectures, where it plays a key role in downstream applications like Neural Architecture Search (NAS). These methods typically learn an unsupervised representation space before generating/ sampling architectures for the downstream search. A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space, however, sampling from these spaces often leads to a high percentage of invalid or duplicate neural architectures. This could be due to the unnatural mapping of inherently discrete architectural space onto a continuous space, which emphasizes the need for a robust discrete representation of these architectures. To address this, we introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures. In contrast to VAEs, VQ-VAEs (i) map each architecture into a discrete code sequence and (ii) allow the prior to be learned by any generative model rather than assuming a normal distribution. We then represent these architecture latent codes as numerical sequences and train a text-to-text model leveraging a Large Language Model to learn and generate sequences representing architectures. We experiment our method with Inception/ ResNet-like cell-based search spaces, namely NAS-Bench-101 and NAS-Bench-201. Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201. Finally, we demonstrate the applicability of our method in NAS employing a sequence-modeling-based NAS algorithm.
