On the use of LLMs to generate a dataset of Neural Networks
Nadia Daoudi, Jordi Cabot
TL;DR
The paper tackles the scarcity of public, diverse neural network datasets suitable for evaluating reliability tools like verification, refactoring, and migration. It leverages GPT-5 to generate a dataset of 608 NN architectures across multiple architecture types, tasks, input types, and complexity levels, guided by carefully defined requirements and a uniform PyTorch implementation template. A validation tool using static analysis and symbolic tracing ensures generated architectures adhere to prompts and remain structurally sound, with eight non-compliant instances regenerated and re-validated. The resulting dataset contains 6842 layers across 38 unique layer types and is released publicly to support research on NN reliability and adaptability, enabling fairer, more thorough benchmarking of NN tools.
Abstract
Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.
