Table of Contents
Fetching ...

Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning

Deshani Geethika Poddenige, Sachith Seneviratne, Damith Senanayake, Mahesan Niranjan, PN Suganthan, Saman Halgamuge

TL;DR

Arch-LLM addresses the challenge of generating valid neural architectures by replacing continuous latent spaces with a discrete, VQ-VAE–based latent representation and then fine-tuning a Large Language Model to generate architecture sequences. The approach converts architectures into codebook index sequences and leverages LLMs for sequence modeling, enabling unsupervised neural architecture generation and a NAS algorithm grounded in text-to-text generation. Empirical results on NAS-Bench-101 and NAS-Bench-201 show substantial improvements in validity, uniqueness, and novelty over VAE baselines, with controllable generation behavior via temperature. The work demonstrates a practical, unsupervised pathway to NAS that harnesses cross-domain NLP techniques and highlights both the potential and limitations of discrete latent representations for architectural search.

Abstract

Unsupervised representation learning has been widely explored across various modalities, including neural architectures, where it plays a key role in downstream applications like Neural Architecture Search (NAS). These methods typically learn an unsupervised representation space before generating/ sampling architectures for the downstream search. A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space, however, sampling from these spaces often leads to a high percentage of invalid or duplicate neural architectures. This could be due to the unnatural mapping of inherently discrete architectural space onto a continuous space, which emphasizes the need for a robust discrete representation of these architectures. To address this, we introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures. In contrast to VAEs, VQ-VAEs (i) map each architecture into a discrete code sequence and (ii) allow the prior to be learned by any generative model rather than assuming a normal distribution. We then represent these architecture latent codes as numerical sequences and train a text-to-text model leveraging a Large Language Model to learn and generate sequences representing architectures. We experiment our method with Inception/ ResNet-like cell-based search spaces, namely NAS-Bench-101 and NAS-Bench-201. Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201. Finally, we demonstrate the applicability of our method in NAS employing a sequence-modeling-based NAS algorithm.

Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning

TL;DR

Arch-LLM addresses the challenge of generating valid neural architectures by replacing continuous latent spaces with a discrete, VQ-VAE–based latent representation and then fine-tuning a Large Language Model to generate architecture sequences. The approach converts architectures into codebook index sequences and leverages LLMs for sequence modeling, enabling unsupervised neural architecture generation and a NAS algorithm grounded in text-to-text generation. Empirical results on NAS-Bench-101 and NAS-Bench-201 show substantial improvements in validity, uniqueness, and novelty over VAE baselines, with controllable generation behavior via temperature. The work demonstrates a practical, unsupervised pathway to NAS that harnesses cross-domain NLP techniques and highlights both the potential and limitations of discrete latent representations for architectural search.

Abstract

Unsupervised representation learning has been widely explored across various modalities, including neural architectures, where it plays a key role in downstream applications like Neural Architecture Search (NAS). These methods typically learn an unsupervised representation space before generating/ sampling architectures for the downstream search. A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space, however, sampling from these spaces often leads to a high percentage of invalid or duplicate neural architectures. This could be due to the unnatural mapping of inherently discrete architectural space onto a continuous space, which emphasizes the need for a robust discrete representation of these architectures. To address this, we introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures. In contrast to VAEs, VQ-VAEs (i) map each architecture into a discrete code sequence and (ii) allow the prior to be learned by any generative model rather than assuming a normal distribution. We then represent these architecture latent codes as numerical sequences and train a text-to-text model leveraging a Large Language Model to learn and generate sequences representing architectures. We experiment our method with Inception/ ResNet-like cell-based search spaces, namely NAS-Bench-101 and NAS-Bench-201. Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201. Finally, we demonstrate the applicability of our method in NAS employing a sequence-modeling-based NAS algorithm.

Paper Structure

This paper contains 27 sections, 7 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of the proposed Arch-LLM Framework. (1) Step 1 - we train a VQ-VAE to build a discrete latent space of neural architectures. We input neural architectures in an adjacency matrix and one-hot encoding format to the Encoder E, and it produces an encoded vector $Z_{e}$. The VQ component leverages a learnable look-up table, Codebook e to find the nearest neighbour indices Z, and the corresponding vector $Z_{q}$ for $Z_e$. Then Decoder D reconstructs $Z_{q}$ back to the neural architecture in its original format. (2) Step 2 - the numerical sequence Z is converted to a "sentence" and the LLM is finetuned on text-to-text generation task. (3) Step 3 - we use the finetuned LLM to generate architecture sequences by providing prompts "generate" and "fill:".
  • Figure 2: Illustration of how Validity, Absolute Uniqueness and Absolute Novelty vary with the temperature value of Arch-LLM for NAS-Bench-101 dataset when optimizing the generations for Absolute Uniqueness as in (a), and Absolute Novelty as in (b). We picked temperature t=0.7 as the optimum Absolute Uniqueness point and temperature t=1.8 as the optimum Absolute Novelty point.
  • Figure 3: Illustration of the original vs generated novel architecture distribution of Arch-LLM t=1.8 which is fine-tuned on NASBench101 dataset. These heatmaps correspond to sequence positions 1 and 2. Each heatmap has two rows, where the top row demonstrates the original distribution of codebook indices of the architectures used for training, and the bottom row corresponds to the distribution of codebook indices of the novel architectures generated.