BeetleFlow: An Integrative Deep Learning Pipeline for Beetle Image Processing
Fangxun Liu, S M Rayeed, Samuel Stevens, Alyson East, Cheng Hsuan Chiang, Colin Lee, Daniel Yi, Junke Yang, Tejas Naik, Ziyi Wang, Connor Kilrain, Elijah H Buckwalter, Jiacheng Hou, Saul Ibaven Bueno, Shuheng Wang, Xinyue Ma, Yifan Liu, Zhiyuan Tao, Ziheng Zhang, Eric Sokol, Michael Belitz, Sydne Record, Charles V. Stewart, Wei-Lun Chao
TL;DR
BeetleFlow targets the bottleneck of processing thousands of beetle tray images by an integrated 3-stage pipeline that combines open-vocabulary detection with a vision-language verifier and transformer-based fine-grained segmentation. It employs iterative Grounding DINO detection with a final LLaVA-NeXT check, followed by cropping and optional metadata sorting, and then two Mask2Former models for 5- and 9-class morphological segmentation with competitive IoU scores. The approach achieves high detection accuracy on NEON-derived trays (97.81%) and strong segmentation performance (mIOU up to 85.11% for 5-class), enabling high-throughput beetle morphometrics and downstream biodiversity analyses. The framework generalizes to other biological imaging tasks and supports future enhancements such as automatic scale calibration and colorimetric standardization for broader applicability.
Abstract
In entomology and ecology research, biologists often need to collect a large number of insects, among which beetles are the most common species. A common practice for biologists to organize beetles is to place them on trays and take a picture of each tray. Given the images of thousands of such trays, it is important to have an automated pipeline to process the large-scale data for further research. Therefore, we develop a 3-stage pipeline to detect all the beetles on each tray, sort and crop the image of each beetle, and do morphological segmentation on the cropped beetles. For detection, we design an iterative process utilizing a transformer-based open-vocabulary object detector and a vision-language model. For segmentation, we manually labeled 670 beetle images and fine-tuned two variants of a transformer-based segmentation model to achieve fine-grained segmentation of beetles with relatively high accuracy. The pipeline integrates multiple deep learning methods and is specialized for beetle image processing, which can greatly improve the efficiency to process large-scale beetle data and accelerate biological research.
