Are All Marine Species Created Equal? Performance Disparities in Underwater Object Detection
Melanie Wille, Tobias Fischer, Scarlett Raine
TL;DR
This work investigates why certain marine species are detected more reliably than others in underwater imagery by decomposing object detection into localization and classification. It systematically manipulates the DUO and RUOD-4C datasets to separate effects of data quantity from intrinsic visual features, and uses YOLO11n along with the TIDE failure-analysis toolkit to diagnose errors. The analysis reveals that foreground-background discrimination is the main bottleneck in localization, while intrinsic feature-based challenges and inter-class dependencies drive persistent classification gaps even under balanced data. Practically, the study recommends distribution-aware training (imbalanced for high precision, balanced for high recall) and emphasizes targeted localization improvements, with open-source code and datasets to enable reproducibility and further research.
Abstract
Underwater object detection is critical for monitoring marine ecosystems but poses unique challenges, including degraded image quality, imbalanced class distribution, and distinct visual characteristics. Not every species is detected equally well, yet underlying causes remain unclear. We address two key research questions: 1) What factors beyond data quantity drive class-specific performance disparities? 2) How can we systematically improve detection of under-performing marine species? We manipulate the DUO and RUOD datasets to separate the object detection task into localization and classification and investigate the under-performance of the scallop class. Localization analysis using YOLO11 and TIDE finds that foreground-background discrimination is the most problematic stage regardless of data quantity. Classification experiments reveal persistent precision gaps even with balanced data, indicating intrinsic feature-based challenges beyond data scarcity and inter-class dependencies. We recommend imbalanced distributions when prioritizing precision, and balanced distributions when prioritizing recall. Improving under-performing classes should focus on algorithmic advances, especially within localization modules. We publicly release our code and datasets.
