AquaMonitor: A multimodal multi-view image sequence dataset for real-life aquatic invertebrate biodiversity monitoring
Mikko Impiö, Philipp M. Rehsen, Tiina Laamanen, Arne J. Beermann, Florian Leese, Jenni Raitoharju
TL;DR
AquaMonitor introduces a pioneering aquatic invertebrate dataset collected in real-world monitoring programs, addressing biases in prior datasets by providing unbiased, site- and time-stamped data with multi-view image sequences. The dataset supports three benchmarks—Monitoring, Classification, and Few-shot—alongside biomass and DNA modalities, enabling evaluation of open-set recognition, long-tailed and imbalanced learning, and cross-modal tasks. Extensive baselines across state-of-the-art backbones and fusion schemes demonstrate strong performance in closed-set classification and practical challenges in open-set and few-shot settings, while highlighting OOD detection difficulties. This resource has high practical impact for real-world biodiversity monitoring and regulatory water-quality assessment, and it paves the way for future multimodal and cross-country biodiversity inference.
Abstract
This paper presents the AquaMonitor dataset, the first large computer vision dataset of aquatic invertebrates collected during routine environmental monitoring. While several large species identification datasets exist, they are rarely collected using standardized collection protocols, and none focus on aquatic invertebrates, which are particularly laborious to collect. For AquaMonitor, we imaged all specimens from two years of monitoring whenever imaging was possible given practical limitations. The dataset enables the evaluation of automated identification methods for real-life monitoring purposes using a realistically challenging and unbiased setup. The dataset has 2.7M images from 43,189 specimens, DNA sequences for 1358 specimens, and dry mass and size measurements for 1494 specimens, making it also one of the largest biological multi-view and multimodal datasets to date. We define three benchmark tasks and provide strong baselines for these: 1) Monitoring benchmark, reflecting real-life deployment challenges such as open-set recognition, distribution shift, and extreme class imbalance, 2) Classification benchmark, which follows a standard fine-grained visual categorization setup, and 3) Few-shot benchmark, which targets classes with only few training examples from very fine-grained categories. Advancements on the Monitoring benchmark can directly translate to improvement of aquatic biodiversity monitoring, which is an important component of regular legislative water quality assessment in many countries.
