Table of Contents
Fetching ...

Towards Scalable Insect Monitoring: Ultra-Lightweight CNNs as On-Device Triggers for Insect Camera Traps

Ross Gardiner, Sareh Rowands, Benno I. Simmons

TL;DR

This paper tackles the inefficiency of PIR triggers for detecting small insects by developing ultra-lightweight CNNs that run on low-power MCUs to act as on-device triggers for insect camera traps. Through tiling to preserve high-resolution insect features, transfer learning, and 8-bit quantization, the authors demonstrate zero-latency capture with strong AUROC performance (≈0.92–0.96) on unseen field data, while consuming under 300 mW on an ESP32-S3. Saliency analyses indicate robust, insect-focused representations, and hardware tests show viable frame rates for practical deployment, enabling longer-term, scalable monitoring. The work provides a concrete, open-source path toward more efficient, high-coverage insect monitoring and highlights future directions for improving generalization and higher-resolution input handling on resource-constrained devices.

Abstract

Camera traps, combined with AI, have emerged as a way to achieve automated, scalable biodiversity monitoring. However, the passive infrared (PIR) sensors that trigger camera traps are poorly suited for detecting small, fast-moving ectotherms such as insects. Insects comprise over half of all animal species and are key components of ecosystems and agriculture. The need for an appropriate and scalable insect camera trap is critical in the wake of concerning reports of declines in insect populations. This study proposes an alternative to the PIR trigger: ultra-lightweight convolutional neural networks running on low-powered hardware to detect insects in a continuous stream of captured images. We train a suite of models to distinguish insect images from backgrounds. Our design achieves zero latency between trigger and image capture. Our models are rigorously tested and achieve high accuracy ranging from 91.8% to 96.4% AUC on validation data and >87% AUC on data from distributions unseen during training. The high specificity of our models ensures minimal saving of false positive images, maximising deployment storage efficiency. High recall scores indicate a minimal false negative rate, maximising insect detection. Further analysis with saliency maps shows the learned representation of our models to be robust, with low reliance on spurious background features. Our system is also shown to operate deployed on off-the-shelf, low-powered microcontroller units, consuming a maximum power draw of less than 300mW. This enables longer deployment times using cheap and readily available battery components. Overall we offer a step change in the cost, efficiency and scope of insect monitoring. Solving the challenging trigger problem, we demonstrate a system which can be deployed for far longer than existing designs and budgets power and bandwidth effectively, moving towards a generic insect camera trap.

Towards Scalable Insect Monitoring: Ultra-Lightweight CNNs as On-Device Triggers for Insect Camera Traps

TL;DR

This paper tackles the inefficiency of PIR triggers for detecting small insects by developing ultra-lightweight CNNs that run on low-power MCUs to act as on-device triggers for insect camera traps. Through tiling to preserve high-resolution insect features, transfer learning, and 8-bit quantization, the authors demonstrate zero-latency capture with strong AUROC performance (≈0.92–0.96) on unseen field data, while consuming under 300 mW on an ESP32-S3. Saliency analyses indicate robust, insect-focused representations, and hardware tests show viable frame rates for practical deployment, enabling longer-term, scalable monitoring. The work provides a concrete, open-source path toward more efficient, high-coverage insect monitoring and highlights future directions for improving generalization and higher-resolution input handling on resource-constrained devices.

Abstract

Camera traps, combined with AI, have emerged as a way to achieve automated, scalable biodiversity monitoring. However, the passive infrared (PIR) sensors that trigger camera traps are poorly suited for detecting small, fast-moving ectotherms such as insects. Insects comprise over half of all animal species and are key components of ecosystems and agriculture. The need for an appropriate and scalable insect camera trap is critical in the wake of concerning reports of declines in insect populations. This study proposes an alternative to the PIR trigger: ultra-lightweight convolutional neural networks running on low-powered hardware to detect insects in a continuous stream of captured images. We train a suite of models to distinguish insect images from backgrounds. Our design achieves zero latency between trigger and image capture. Our models are rigorously tested and achieve high accuracy ranging from 91.8% to 96.4% AUC on validation data and >87% AUC on data from distributions unseen during training. The high specificity of our models ensures minimal saving of false positive images, maximising deployment storage efficiency. High recall scores indicate a minimal false negative rate, maximising insect detection. Further analysis with saliency maps shows the learned representation of our models to be robust, with low reliance on spurious background features. Our system is also shown to operate deployed on off-the-shelf, low-powered microcontroller units, consuming a maximum power draw of less than 300mW. This enables longer deployment times using cheap and readily available battery components. Overall we offer a step change in the cost, efficiency and scope of insect monitoring. Solving the challenging trigger problem, we demonstrate a system which can be deployed for far longer than existing designs and budgets power and bandwidth effectively, moving towards a generic insect camera trap.

Paper Structure

This paper contains 15 sections, 3 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Software pipeline showcasing how the trigger (CNN, binary classifier) can be built into programmes on deployed devices. Initially, images are captured and a preprocessed variant of the image is fed into the model for prediction. Logic thresholds the prediction to warrant saving the original image, this loop then repeats.
  • Figure 2: Simplified workflow for saliency map analysis showing the main stages. Inference yields a prediction and a corresponding saliency map, salient pixels are thresholded, creating a binary map. Pixels within the insect bounding box are counted and the portion of salient pixels is plotted for all thresholds and images.
  • Figure 3: ROC curves for each model computed from the test dataset. Shows true positive rate (recall) against false positive rate (1 - specificity) over all thresholds. Original and quantised models 1-8 shown as solid and dotted lines respectively.
  • Figure 4: Resultant curves for our saliency score, $\bar{P}$ for each threshold, $t$, averaged over each image our test dataset measuring the alignment of saliency maps with insect image regions. Each solid line depicts models 1-8.
  • Figure 5: Average power consumption for each model versus operational frames per second processed by the ESP32-S3 chipset.
  • ...and 2 more figures