Table of Contents
Fetching ...

BioAutoML-NAS: An End-to-End AutoML Framework for Multimodal Insect Classification via Neural Architecture Search on Large-Scale Biodiversity Data

Arefin Ittesafun Abian, Debopom Sutradhar, Md Rafi Ur Rashid, Reem E. Mohamed, Md Rafiqul Islam, Asif Karim, Kheng Cher Yeo, Sami Azam

TL;DR

BioAutoML-NAS presents an end-to-end AutoML framework that combines neural architecture search with multimodal data (images and metadata) to tackle large-scale insect classification. The model uses a differentiable NAS space for image encoding, a dedicated metadata encoder, and a fusion module, optimized via alternating bi-level training and zero-operations to yield sparse yet high-performing architectures. Empirical results on BIOSCAN-5M (96.81% accuracy) and Insects-1M (93.25% accuracy) demonstrate substantial improvements over state-of-the-art TL, transformer, AutoML, and NAS-based approaches, highlighting robustness and scalability for biodiversity monitoring. The approach addresses class imbalance and data scale without resorting to re-sampling or synthetic data, signaling practical impact for sustainable agriculture and ecological research.

Abstract

Insect classification is important for agricultural management and ecological research, as it directly affects crop health and production. However, this task remains challenging due to the complex characteristics of insects, class imbalance, and large-scale datasets. To address these issues, we propose BioAutoML-NAS, the first BioAutoML model using multimodal data, including images, and metadata, which applies neural architecture search (NAS) for images to automatically learn the best operations for each connection within each cell. Multiple cells are stacked to form the full network, each extracting detailed image feature representations. A multimodal fusion module combines image embeddings with metadata, allowing the model to use both visual and categorical biological information to classify insects. An alternating bi-level optimization training strategy jointly updates network weights and architecture parameters, while zero operations remove less important connections, producing sparse, efficient, and high-performing architectures. Extensive evaluation on the BIOSCAN-5M dataset demonstrates that BioAutoML-NAS achieves 96.81% accuracy, 97.46% precision, 96.81% recall, and a 97.05% F1 score, outperforming state-of-the-art transfer learning, transformer, AutoML, and NAS methods by approximately 16%, 10%, and 8% respectively. Further validation on the Insects-1M dataset obtains 93.25% accuracy, 93.71% precision, 92.74% recall, and a 93.22% F1 score. These results demonstrate that BioAutoML-NAS provides accurate, confident insect classification that supports modern sustainable farming.

BioAutoML-NAS: An End-to-End AutoML Framework for Multimodal Insect Classification via Neural Architecture Search on Large-Scale Biodiversity Data

TL;DR

BioAutoML-NAS presents an end-to-end AutoML framework that combines neural architecture search with multimodal data (images and metadata) to tackle large-scale insect classification. The model uses a differentiable NAS space for image encoding, a dedicated metadata encoder, and a fusion module, optimized via alternating bi-level training and zero-operations to yield sparse yet high-performing architectures. Empirical results on BIOSCAN-5M (96.81% accuracy) and Insects-1M (93.25% accuracy) demonstrate substantial improvements over state-of-the-art TL, transformer, AutoML, and NAS-based approaches, highlighting robustness and scalability for biodiversity monitoring. The approach addresses class imbalance and data scale without resorting to re-sampling or synthetic data, signaling practical impact for sustainable agriculture and ecological research.

Abstract

Insect classification is important for agricultural management and ecological research, as it directly affects crop health and production. However, this task remains challenging due to the complex characteristics of insects, class imbalance, and large-scale datasets. To address these issues, we propose BioAutoML-NAS, the first BioAutoML model using multimodal data, including images, and metadata, which applies neural architecture search (NAS) for images to automatically learn the best operations for each connection within each cell. Multiple cells are stacked to form the full network, each extracting detailed image feature representations. A multimodal fusion module combines image embeddings with metadata, allowing the model to use both visual and categorical biological information to classify insects. An alternating bi-level optimization training strategy jointly updates network weights and architecture parameters, while zero operations remove less important connections, producing sparse, efficient, and high-performing architectures. Extensive evaluation on the BIOSCAN-5M dataset demonstrates that BioAutoML-NAS achieves 96.81% accuracy, 97.46% precision, 96.81% recall, and a 97.05% F1 score, outperforming state-of-the-art transfer learning, transformer, AutoML, and NAS methods by approximately 16%, 10%, and 8% respectively. Further validation on the Insects-1M dataset obtains 93.25% accuracy, 93.71% precision, 92.74% recall, and a 93.22% F1 score. These results demonstrate that BioAutoML-NAS provides accurate, confident insect classification that supports modern sustainable farming.

Paper Structure

This paper contains 29 sections, 13 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: We process the images using a NAS-based image encoder and the metadata using a separate encoder, and fused the two representations to obtain the classification output.
  • Figure 2: The NAS-based image encoder explores a search space of ten candidate operations, including convolutional filters, pooling layers, skip connections, and channel attention mechanisms, to automatically learn rich and detailed feature representations from input images.
  • Figure 3: Overview of the proposed BioAutoML-NAS model, highlighting dual encoders that extract multimodal features and a fusion module that integrates them for accurate classification.
  • Figure 4: In the bi-level training framework, architectural parameters are updated on odd-numbered batches, while network weights are updated on even-numbered batches, enabling stable optimization by alternating between architecture parameter updates and weight learning.
  • Figure 5: ROC curve and confusion matrix of the proposed BioAutoML-NAS model on the BIOSCAN-5M dataset, demonstrating its high classification accuracy and reliable performance across classes.