Table of Contents
Fetching ...

BeetleVerse: A Study on Taxonomic Classification of Ground Beetles

S M Rayeed, Alyson East, Samuel Stevens, Sydne Record, Charles V Stewart

TL;DR

BeetleVerse tackles the challenge of automating taxonomic classification for ground beetles by benchmarking a broad set of vision and vision-language models across four long-tailed datasets. The study demonstrates that ViLT with an MLP head delivers top-tier genus and species accuracy, while revealing strong sample-efficiency and notable domain adaptation gaps between lab and in-situ imagery. It also investigates multimodal extensions using morphological traits and environmental data, highlighting when additional modalities help or hinder performance. Collectively, the work provides a foundation for scalable, cross-domain taxonomic classification of beetles and informs practical data-design strategies for long-tailed ecological datasets.

Abstract

Ground beetles are a highly sensitive and speciose biological indicator, making them vital for monitoring biodiversity. However, they are currently an underutilized resource due to the manual effort required by taxonomic experts to perform challenging species differentiations based on subtle morphological differences, precluding widespread applications. In this paper, we evaluate 12 vision models on taxonomic classification across four diverse, long-tailed datasets spanning over 230 genera and 1769 species, with images ranging from controlled laboratory settings to challenging field-collected (in-situ) photographs. We further explore taxonomic classification in two important real-world contexts: sample efficiency and domain adaptation. Our results show that the Vision and Language Transformer combined with an MLP head is the best performing model, with 97% accuracy at genus and 94% at species level. Sample efficiency analysis shows that we can reduce train data requirements by up to 50% with minimal compromise in performance. The domain adaptation experiments reveal significant challenges when transferring models from lab to in-situ images, highlighting a critical domain gap. Overall, our study lays a foundation for large-scale automated taxonomic classification of beetles, and beyond that, advances sample-efficient learning and cross-domain adaptation for diverse long-tailed ecological datasets.

BeetleVerse: A Study on Taxonomic Classification of Ground Beetles

TL;DR

BeetleVerse tackles the challenge of automating taxonomic classification for ground beetles by benchmarking a broad set of vision and vision-language models across four long-tailed datasets. The study demonstrates that ViLT with an MLP head delivers top-tier genus and species accuracy, while revealing strong sample-efficiency and notable domain adaptation gaps between lab and in-situ imagery. It also investigates multimodal extensions using morphological traits and environmental data, highlighting when additional modalities help or hinder performance. Collectively, the work provides a foundation for scalable, cross-domain taxonomic classification of beetles and informs practical data-design strategies for long-tailed ecological datasets.

Abstract

Ground beetles are a highly sensitive and speciose biological indicator, making them vital for monitoring biodiversity. However, they are currently an underutilized resource due to the manual effort required by taxonomic experts to perform challenging species differentiations based on subtle morphological differences, precluding widespread applications. In this paper, we evaluate 12 vision models on taxonomic classification across four diverse, long-tailed datasets spanning over 230 genera and 1769 species, with images ranging from controlled laboratory settings to challenging field-collected (in-situ) photographs. We further explore taxonomic classification in two important real-world contexts: sample efficiency and domain adaptation. Our results show that the Vision and Language Transformer combined with an MLP head is the best performing model, with 97% accuracy at genus and 94% at species level. Sample efficiency analysis shows that we can reduce train data requirements by up to 50% with minimal compromise in performance. The domain adaptation experiments reveal significant challenges when transferring models from lab to in-situ images, highlighting a critical domain gap. Overall, our study lays a foundation for large-scale automated taxonomic classification of beetles, and beyond that, advances sample-efficient learning and cross-domain adaptation for diverse long-tailed ecological datasets.

Paper Structure

This paper contains 41 sections, 16 figures, 13 tables.

Figures (16)

  • Figure 1: Samples from the four datasets described in Section \ref{['subsec:DataCollection']}. From left: Mecyclothroax konanus (from BeetlePUUM), Poecilus scitulus (from BeetlePalooza), Amara aulica (from NHM-Carabids), and Carabus vietinghoffii (from I1MC).
  • Figure 2: Probability density distributions of datasets. X-axis: Number of samples per species (abundance). Y-axis: Probability density (relative frequency of species with given sample count). Histogram: Distribution of species by number of samples. Curve: Fitted probability distribution based on dataset statistics. The plots illustrate the variation in samples per species, with fitted probability distributions and key statistical parameters including mean, median, and quartile ranges (Q1, Q3). All four datasets exhibit characteristic right-skewed distributions (skewness values from 1.53 to 6.30), reflecting the common long-tailed pattern in ecological datasets where a few species are extensively sampled while most are represented by relatively few specimens.
  • Figure 3: Performance of ViLT with different subsets of samples being probed. Subset1: 2900 images, Subset2: 5800 images, Subset3: 14500 images, Half-Set: 30000 images, Full-Set: 63077 images. Y-axis is the model's accuracy score on corresponding data.
  • Figure 4: Cross-dataset domain adaptation performance of ViLT. Case A: Train on NHM-Carabids, Test on I1MC (at genus); Case B: same as A (at species); Case C: Train on BeetlePalooza, Test on I1MC (at genus); Case D: same as C (at species); Case E: Train on NHM-Carabids, Test on BeetlePalooza (at genus); Case X: Average of Cases A, B, C, D. Cases A to D represent lab-to-in-situ domain shifts for both genus and species levels, while Case E is evaluated at the genus level only due to limited species-level overlap between NHM-Carabids and BeetlePalooza.
  • Figure 5: A sample group image and corresponding individual crops from the BeetlePUUM dataset. Leftmost panel shows the group image with measurement scale, while the four right panels present images of those specimens individually cropped.
  • ...and 11 more figures