FR-NAS: Forward-and-Reverse Graph Predictor for Efficient Neural Architecture Search
๐ Abstract
The paper introduces a novel Graph Neural Network (GNN) based performance predictor for Neural Architecture Search (NAS), called FR-NAS. The key idea is to leverage both the forward and reverse graph representations of neural architectures to enhance the prediction accuracy, especially when training data is limited. The paper presents a detailed analysis showing the potential benefits of using dual graph representations, and proposes a customized training loss to ensure the encoders from the forward and reverse graphs converge towards shared features. Experiments on benchmark NAS datasets demonstrate that FR-NAS outperforms state-of-the-art GNN-based predictors.
๐ Q&A
[01] Introduction
1. What is the motivation behind the proposed FR-NAS method? The motivation behind FR-NAS is to leverage the inherent bidirectionality of neural architectures, which involve both forward and backward propagation phases, to enhance the performance of graph-based predictors. The authors observe that in the presence of limited training data, the encoder often faces challenges in effectively embedding features crucial for precise predictions. By employing separate encoders for forward and reverse graph representations, the authors aim to enable a collaboration between the encoders to better utilize the available data.
2. What are the key contributions of this work? The key contributions of this work are:
- Analyzing the features extracted by GIN predictors using both forward and reverse graph representations, and highlighting the potential of dual graph representations in enhancing prediction accuracy.
- Designing a performance predictor that utilizes both forward and reverse graph depictions of architectures, and introducing a tailored training loss to ensure congruence in the embeddings generated by the two GIN encoders.
[02] Method
1. How are the neural architectures represented as input to the proposed predictor? The neural architectures are represented as Directed Acyclic Graphs (DAGs), where the adjacency matrix denotes the edges connecting the vertices, and a sequence of one-hot vectors represents the operations at each vertex. The authors refer to this encoding as the "forward graph encoding". The "reverse graph encoding" is derived by transposing the adjacency matrix.
2. Can you explain the encoder and predictor design of the proposed FR-NAS method? The proposed method integrates two distinct GIN encoders to process the forward and reverse graph encodings, respectively. Each GIN encoder comprises three consecutive layers with fully connected structures and ReLU activations, followed by a Global Mean Pooling layer to extract the embeddings.
The predictor consists of two separate fully connected layers, each addressing the feature embeddings from a specific encoder. The final prediction is obtained by averaging the outputs of the two fully connected layers.
3. What is the motivation behind the proposed training loss function? The authors observe that in the presence of limited training data, the encoders may not optimally capture features crucial for precise predictions. To mitigate this, they propose a mutual learning strategy where the encoders reciprocally reinforce their ability to identify and exploit shared features.
The training loss is formulated to reduce the discrepancy between the embeddings generated by the two encoders, inspired by the Instance Relationship Graph (IRG) framework. This loss function aims to ensure the congruence of the embeddings produced by the forward and reverse graph encoders.
[03] Experiments
1. What are the benchmark search spaces used in the experiments? The experiments were conducted on three benchmark search spaces: NAS-Bench-101, NAS-Bench-201, and the DARTS search space.
2. How does the proposed FR-NAS method perform compared to the baseline methods? The experimental results show that the proposed FR-NAS method outperforms the state-of-the-art GNN-based peer methods, NPENAS and NPNAS, as well as other model-based predictors like NAO, BONAS, and BANANAS, across all the benchmark search spaces and training data sizes.
3. What are the key findings from the ablation study? The ablation study demonstrates the effectiveness of the two main components of the FR-NAS predictor:
- The use of two GIN encoders taking forward and reverse graph representations as input.
- The incorporation of the feature loss combined with the prediction loss during training.
The results show that the simple forward-and-reverse paired predictor outperforms the single direction predictors, and the addition of the feature loss further improves the performance, especially with smaller training data sizes.