Table of Contents
Fetching ...

On the use of LLMs to generate a dataset of Neural Networks

Nadia Daoudi, Jordi Cabot

TL;DR

The paper tackles the scarcity of public, diverse neural network datasets suitable for evaluating reliability tools like verification, refactoring, and migration. It leverages GPT-5 to generate a dataset of 608 NN architectures across multiple architecture types, tasks, input types, and complexity levels, guided by carefully defined requirements and a uniform PyTorch implementation template. A validation tool using static analysis and symbolic tracing ensures generated architectures adhere to prompts and remain structurally sound, with eight non-compliant instances regenerated and re-validated. The resulting dataset contains 6842 layers across 38 unique layer types and is released publicly to support research on NN reliability and adaptability, enabling fairer, more thorough benchmarking of NN tools.

Abstract

Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.

On the use of LLMs to generate a dataset of Neural Networks

TL;DR

The paper tackles the scarcity of public, diverse neural network datasets suitable for evaluating reliability tools like verification, refactoring, and migration. It leverages GPT-5 to generate a dataset of 608 NN architectures across multiple architecture types, tasks, input types, and complexity levels, guided by carefully defined requirements and a uniform PyTorch implementation template. A validation tool using static analysis and symbolic tracing ensures generated architectures adhere to prompts and remain structurally sound, with eight non-compliant instances regenerated and re-validated. The resulting dataset contains 6842 layers across 38 unique layer types and is released publicly to support research on NN reliability and adaptability, enabling fairer, more thorough benchmarking of NN tools.

Abstract

Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.
Paper Structure (20 sections, 2 figures, 2 tables)

This paper contains 20 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of our LLM-driven NN dataset generation
  • Figure 2: Depth of all layers and of CLs for the full NN dataset and for NNs grouped by complexity level