FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Dongyeong Hwang; Hyunju Kim; Sunwoo Kim; Kijung Shin

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Dongyeong Hwang, Hyunju Kim, Sunwoo Kim, Kijung Shin

TL;DR

FLOWERFORMER is introduced, a powerful graph transformer that in-corporates the information flows within a neural architecture and demonstrates the superiority of FLOWERFORMER over existing neural encoding methods, and its effectiveness extends beyond computer vision models to include graph neural networks and auto speech recognition models.

Abstract

The success of a specific neural network architecture is closely tied to the dataset and task it tackles; there is no one-size-fits-all solution. Thus, considerable efforts have been made to quickly and accurately estimate the performances of neural architectures, without full training or evaluation, for given tasks and datasets. Neural architecture encoding has played a crucial role in the estimation, and graphbased methods, which treat an architecture as a graph, have shown prominent performance. For enhanced representation learning of neural architectures, we introduce FlowerFormer, a powerful graph transformer that incorporates the information flows within a neural architecture. FlowerFormer consists of two key components: (a) bidirectional asynchronous message passing, inspired by the flows; (b) global attention built on flow-based masking. Our extensive experiments demonstrate the superiority of FlowerFormer over existing neural encoding methods, and its effectiveness extends beyond computer vision models to include graph neural networks and auto speech recognition models. Our code is available at http://github.com/y0ngjaenius/CVPR2024_FLOWERFormer.

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

TL;DR

Abstract

Paper Structure (31 sections, 12 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 31 sections, 12 equations, 7 figures, 9 tables, 1 algorithm.

Introduction
Related work
Neural architecture encoding
Graph transformers (GTs)
Proposed method: FlowerFormer
Motivation of capturing information flows
Input modeling
Flower layers
Flow encode module
Flow-aware global attention module
Overall framework: FlowerFormer
Experiments
Experimental settings
Datasets
Baseline methods
...and 16 more sections

Figures (7)

Figure 1: Information flows within an example neural architecture from the NAS-Bench-101 benchmark ying2019bench. The architecture is represented as a directed graph where each node corresponds to an operation, and the topological structure of the graph encodes the sequence in which these operations are performed. For instance, the '$1\times1$' (convolution) operation is executed only after the '$3\times3$' (convolution) and 'mp' (max pooling) operations have been completed. The forward pass, depicted by blue arrows, is followed by the backpropagation of the loss, depicted by orange arrows. The number displayed above each node indicates the processing order within each flow.
Figure 2: Overview of proposed FlowerFormer, which contains two key modules in each of its layers: the flow encode module and the flow-aware global attention module. The flow encode module performs bidirectional asynchronous message passing, inspired by forward and backward passes, to produce a node embedding matrix $H_{\text{flow}}$. The flow-aware global attention module computes attention with a flow-based masking scheme to yield another node embedding matrix $H_{\text{global}}$. These two embedding matrices, $H_{\text{flow}}$ and $H_{\text{global}}$, are combined and then projected to produce updated node embeddings at each layer. This process is iterated over $L$ layers, and the output node embeddings are aggregated to form the final architecture embedding, which is fed into a regressor for performance prediction.
Figure 3: An example neural architecture from the NAS-Bench-101 dataset, represented as a directed acyclic graph (DAG), and its adjacency matrix $A$. Each column of the node feature matrix $X$ corresponds to a specific operation, and each row in $X$ is a one-hot vector indicating the type of operation associated with the corresponding node.
Figure 4: Example topological generations. Nodes 1 and 2 are devoid of incoming edges, and thus they constitute the first topological generation $T^{G}_{1}$. Upon removal of nodes 1 and 2, nodes 3, 4, and 5 no longer have incoming edges, and thus they compose the second generation $T^{G}_{2}$. Subsequently, nodes 6 and 7 form the third and fourth generations, respectively.
Figure 5: Flow encode module. During forward message passing, node embeddings are updated following the order of topological generations. Conversely, during backward message passing, node embeddings are updated in the reverse order of the generations.
...and 2 more figures

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

TL;DR

Abstract

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (7)