BeetleVerse: A Study on Taxonomic Classification of Ground Beetles
S M Rayeed, Alyson East, Samuel Stevens, Sydne Record, Charles V Stewart
TL;DR
BeetleVerse tackles the challenge of automating taxonomic classification for ground beetles by benchmarking a broad set of vision and vision-language models across four long-tailed datasets. The study demonstrates that ViLT with an MLP head delivers top-tier genus and species accuracy, while revealing strong sample-efficiency and notable domain adaptation gaps between lab and in-situ imagery. It also investigates multimodal extensions using morphological traits and environmental data, highlighting when additional modalities help or hinder performance. Collectively, the work provides a foundation for scalable, cross-domain taxonomic classification of beetles and informs practical data-design strategies for long-tailed ecological datasets.
Abstract
Ground beetles are a highly sensitive and speciose biological indicator, making them vital for monitoring biodiversity. However, they are currently an underutilized resource due to the manual effort required by taxonomic experts to perform challenging species differentiations based on subtle morphological differences, precluding widespread applications. In this paper, we evaluate 12 vision models on taxonomic classification across four diverse, long-tailed datasets spanning over 230 genera and 1769 species, with images ranging from controlled laboratory settings to challenging field-collected (in-situ) photographs. We further explore taxonomic classification in two important real-world contexts: sample efficiency and domain adaptation. Our results show that the Vision and Language Transformer combined with an MLP head is the best performing model, with 97% accuracy at genus and 94% at species level. Sample efficiency analysis shows that we can reduce train data requirements by up to 50% with minimal compromise in performance. The domain adaptation experiments reveal significant challenges when transferring models from lab to in-situ images, highlighting a critical domain gap. Overall, our study lays a foundation for large-scale automated taxonomic classification of beetles, and beyond that, advances sample-efficient learning and cross-domain adaptation for diverse long-tailed ecological datasets.
