LEMUR Neural Network Dataset: Towards Seamless AutoML
Arash Torabi Goodarzi, Roman Kochnev, Waleed Khalid, Hojjat Torabi Goudarzi, Furui Qin, Tolgay Atinc Uzun, Yashkumar Sanjaybhai Dhameliya, Yash Kanubhai Kathiriya, Zofia Antonina Bentyn, Dmitry Ignatov, Radu Timofte
TL;DR
LEMUR introduces an open-source dataset and framework that treats neural network architectures themselves as data, providing standardized PyTorch implementations, YAML specifications, and an SQLite-backed ledger for reproducible benchmarking across common vision tasks and NLP. By integrating Optuna for automated hyperparameter optimization and offering extensive visualization and API access, it aims to accelerate AutoML research, enable fair cross-model comparisons, and support extensibility through community contributions. The system emphasizes reproducibility, traceability, and accessibility, with modular components designed to incorporate new architectures, datasets, and metrics under a unified evaluation pipeline. Collectively, LEMUR provides a comprehensive end-to-end ecosystem—from model implementations to analysis and reporting—that can serve as a foundational resource for large-scale neural network experimentation and AutoML studies.
Abstract
Neural networks are the backbone of modern artificial intelligence, but designing, evaluating, and comparing them remains labor-intensive. While numerous datasets exist for training, there are few standardized collections of the models themselves. We introduce LEMUR, an open-source dataset and framework that provides a large collection of PyTorch-based neural networks across tasks such as classification, segmentation, detection, and natural language processing. Each model follows a unified template, with configurations and results stored in a structured database to ensure consistency and reproducibility. LEMUR integrates automated hyperparameter optimization via Optuna, includes statistical analysis and visualization tools, and offers an API for seamless access to performance data. The framework is extensible, allowing researchers to add new models, datasets, or metrics without breaking compatibility. By standardizing implementations and unifying evaluation, LEMUR aims to accelerate AutoML research, enable fair benchmarking, and reduce barriers to large-scale neural network experimentation. To support adoption and collaboration, LEMUR and its plugins are released under the MIT license at: https://github.com/ABrain-One/nn-dataset https://github.com/ABrain-One/nn-plots https://github.com/ABrain-One/nn-vr
