Table of Contents
Fetching ...

A machine learning pipeline for automated insect monitoring

Aditya Jain, Fagner Cunha, Michael Bunsen, Léonard Pasi, Anna Viklund, Maxim Larrivée, David Rolnick

TL;DR

The paper tackles scalable insect monitoring amid widespread insect decline by presenting a complete, open-source pipeline for automated moth monitoring from camera traps. The pipeline comprises detection, moth/non-moth classification, fine-grained moth species identification, and tracking, with zero-shot GBIF/iNaturalist data and synthetic-data training to reduce manual labeling. It reports region-specific validation, data augmentation strategies, and long-tail handling, and provides an AMI Data Companion and Web Platform to facilitate adoption by ecologists with varying ML expertise. The work is deployed across multiple continents and aims to enable massively scalable insect data collection to inform land use decisions and climate adaptation policies.

Abstract

Climate change and other anthropogenic factors have led to a catastrophic decline in insects, endangering both biodiversity and the ecosystem services on which human society depends. Data on insect abundance, however, remains woefully inadequate. Camera traps, conventionally used for monitoring terrestrial vertebrates, are now being modified for insects, especially moths. We describe a complete, open-source machine learning-based software pipeline for automated monitoring of moths via camera traps, including object detection, moth/non-moth classification, fine-grained identification of moth species, and tracking individuals. We believe that our tools, which are already in use across three continents, represent the future of massively scalable data collection in entomology.

A machine learning pipeline for automated insect monitoring

TL;DR

The paper tackles scalable insect monitoring amid widespread insect decline by presenting a complete, open-source pipeline for automated moth monitoring from camera traps. The pipeline comprises detection, moth/non-moth classification, fine-grained moth species identification, and tracking, with zero-shot GBIF/iNaturalist data and synthetic-data training to reduce manual labeling. It reports region-specific validation, data augmentation strategies, and long-tail handling, and provides an AMI Data Companion and Web Platform to facilitate adoption by ecologists with varying ML expertise. The work is deployed across multiple continents and aims to enable massively scalable insect data collection to inform land use decisions and climate adaptation policies.

Abstract

Climate change and other anthropogenic factors have led to a catastrophic decline in insects, endangering both biodiversity and the ecosystem services on which human society depends. Data on insect abundance, however, remains woefully inadequate. Camera traps, conventionally used for monitoring terrestrial vertebrates, are now being modified for insects, especially moths. We describe a complete, open-source machine learning-based software pipeline for automated monitoring of moths via camera traps, including object detection, moth/non-moth classification, fine-grained identification of moth species, and tracking individuals. We believe that our tools, which are already in use across three continents, represent the future of massively scalable data collection in entomology.
Paper Structure (14 sections, 6 figures, 2 tables)

This paper contains 14 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: \ref{['fig:trap1']} and \ref{['fig:trap2']} depict a moth camera trap used by our partners. \ref{['fig:ML prediction']} shows our insect localization and species prediction on a raw image, with red boxes showing insects classified as moths (with fine-grained species predictions), and blue boxes showing non-moths.
  • Figure 2: Machine learning workflow.
  • Figure 3: A preview of the AMI Web Platform in development.
  • Figure 4: Examples of images that are removed during the dataset cleaning procedure. Some images are used as placeholders and do not contain any animals. (a) is an extreme case used by hundreds of thousands of occurrences. In some cases, the same picture has more than one species, and each individual is counted as a single occurrence, with the same image being referenced by all of them (b). Some occurrences have more than one image, and some of them are small images (thumbnails) (c). The scope of our models is only the adult individuals; non-adult pictures (d) should be removed. Finally, some pictures do not have any specimens, only descriptions (e).
  • Figure 5: Left column: blobdata-ResNet50; middle column: syntheticdata-ResNet50; right column: syntheticdata-MobileNetV3. As seen in the first column, the model trained on a small amount of labelled data has many false positives and fails to detect insects close to each other. While the models trained on large amounts of synthetic data (last two columns) overcomes those challenges. We finally use the model with the MobileNetV3-Large-FPN backbone, as it is six times faster than its counterpart and similar in accuracy.
  • ...and 1 more figures