CAP: A Context-Aware Neural Predictor for NAS

Han Ji; Yuqi Feng; Yanan Sun

CAP: A Context-Aware Neural Predictor for NAS

Han Ji, Yuqi Feng, Yanan Sun

TL;DR

This work tackles the high annotation cost of neural predictors in neural architecture search (NAS). It introduces CAP, a context-aware neural predictor that pre-trains on unlabeled architectures using a context-aware self-supervised task over graph representations, enabling expressive, generalizable architecture embeddings with few labeled examples. CAP achieves state-of-the-art ranking and efficient search across NAS-Bench-101, NAS-Bench-201, and DARTS spaces, often using substantially fewer annotated architectures than prior predictors. Ablation studies corroborate the effectiveness of the context-aware pre-training and the proposed fine-tuning and loss strategies, highlighting CAP's practical potential for accelerating NAS.

Abstract

Neural predictors are effective in boosting the time-consuming performance evaluation stage in neural architecture search (NAS), owing to their direct estimation of unseen architectures. Despite the effectiveness, training a powerful neural predictor with fewer annotated architectures remains a huge challenge. In this paper, we propose a context-aware neural predictor (CAP) which only needs a few annotated architectures for training based on the contextual information from the architectures. Specifically, the input architectures are encoded into graphs and the predictor infers the contextual structure around the nodes inside each graph. Then, enhanced by the proposed context-aware self-supervised task, the pre-trained predictor can obtain expressive and generalizable representations of architectures. Therefore, only a few annotated architectures are sufficient for training. Experimental results in different search spaces demonstrate the superior performance of CAP compared with state-of-the-art neural predictors. In particular, CAP can rank architectures precisely at the budget of only 172 annotated architectures in NAS-Bench-101. Moreover, CAP can help find promising architectures in both NAS-Bench-101 and DARTS search spaces on the CIFAR-10 dataset, serving as a useful navigator for NAS to explore the search space efficiently.

CAP: A Context-Aware Neural Predictor for NAS

TL;DR

Abstract

Paper Structure (26 sections, 4 equations, 4 figures, 5 tables)

This paper contains 26 sections, 4 equations, 4 figures, 5 tables.

Introduction
Related Works
Neural Predictor
Context-Aware Self-Supervised Learning
Methodology
Architecture Encoding
Context-Aware Self-Supervised Task
Performance Prediction
Decoder-only Fine-tuning.
Full Fine-tuning.
Partial Fine-tuning.
Experiments
Experimental Settings
NAS-Bench-101 Search Space.
NAS-Bench-201 Search Space.
...and 11 more sections

Figures (4)

Figure 1: The number of annotated architectures for evaluating all the architectures in NAS-Bench-101 is illustrated. Trained by only 172 annotated architectures, CAP beats most existing predictors which use 424 annotated ones for training. Moreover, CAP utilizes $10\times$ fewer annotated architectures with marginal performance drop compared to other predictors.
Figure 2: Overall framework of the proposed CAP method. (a) First, massive architectures are encoded into graph data. In each architecture, operation types are represented by nodes and their connection ways are denoted by an adjacent matrix. (b) For each input graph data, the central subgraph and corresponding context graphs are extracted to construct a graph pair. Then, the encoder is encouraged to match the graph pairs correctly during the context-aware self-supervised task. For example, the central subgraph and context graphs from $G_1$ are a positive pair. (c) Once the pre-training stage is finished, only a few annotated architectures are used to fine-tune the predictor.
Figure 3: Results of Ablation study with different data splits. The Kendall’s Tau of 5 independent runs is calculated.
Figure 4: Visualization results of architecture representation from the pre-trained CAP (left) and baseline predictor (right). Both predictors are trained with 100 annotated architectures. We randomly sample 20,000 architectures to display their average test accuracy in each small area. Different colors denote architectures with different test accuracy.

CAP: A Context-Aware Neural Predictor for NAS

TL;DR

Abstract

CAP: A Context-Aware Neural Predictor for NAS

Authors

TL;DR

Abstract

Table of Contents

Figures (4)