Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Rob Geada, David Towers, Matthew Forshaw, Amir Atapour-Abarghouei, A. Stephen McGough
TL;DR
This paper tackles the problem of NAS generalization by introducing eight unseen NAS datasets spanning Type-1 and Type-2 tasks to stress-test models on problems unknown at training time. It benchmarks standard CNN baselines and NAS methods (PC-DARTS, DrNAS, Bonsai-Net) as well as random search, revealing mixed generalization performance across datasets. The key contributions are the dataset descriptions, splits, and baseline results, plus a discussion of generalization limits and the need for more unseen benchmarks. Overall, the work argues for diversifying NAS evaluation beyond CIFAR-10 and ImageNet to better assess real-world applicability and generalization potential.
Abstract
The boundless possibility of neural networks which can be used to solve a problem -- each with different performance -- leads to a situation where a Deep Learning expert is required to identify the best neural network. This goes against the hope of removing the need for experts. Neural Architecture Search (NAS) offers a solution to this by automatically identifying the best architecture. However, to date, NAS work has focused on a small set of datasets which we argue are not representative of real-world problems. We introduce eight new datasets created for a series of NAS Challenges: AddNIST, Language, MultNIST, CIFARTile, Gutenberg, Isabella, GeoClassing, and Chesseract. These datasets and challenges are developed to direct attention to issues in NAS development and to encourage authors to consider how their models will perform on datasets unknown to them at development time. We present experimentation using standard Deep Learning methods as well as the best results from challenge participants.
