AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish
Stefan Hein Bengtson, Daniel Lehotský, Vasiliki Ismiroglou, Niels Madsen, Thomas B. Moeslund, Malte Pedersen
TL;DR
AutoFish tackles automated, fine-grained catch documentation to support sustainable fisheries by introducing a public dataset of 1,500 RGB images of 454 fish with per-fish IDs, instance segmentation, and length measurements collected on a controlled conveyor belt. The authors establish baseline instance-segmentation (Mask2Former with Swin-B) and length-estimation methods (skeletonization and MobileNetV2 regression), reporting $mAP$ around $89.15\%$ and $MAE$ in the sub-centimeter range for non-occluded cases and higher errors under occlusion. The work demonstrates the feasibility of automated fish documentation, provides extensive annotations in COCO format, and discusses leveraging IDs for fish-level analyses and potential re-identification to improve accuracy. These contributions enable scalable, transparent monitoring in fisheries and form a data-rich foundation for future research in automated catch documentation and per-fish tracking.
Abstract
Automated fish documentation processes are in the near future expected to play an essential role in sustainable fisheries management and for addressing challenges of overfishing. In this paper, we present a novel and publicly available dataset named AutoFish designed for fine-grained fish analysis. The dataset comprises 1,500 images of 454 specimens of visually similar fish placed in various constellations on a white conveyor belt and annotated with instance segmentation masks, IDs, and length measurements. The data was collected in a controlled environment using an RGB camera. The annotation procedure involved manual point annotations, initial segmentation masks proposed by the Segment Anything Model (SAM), and subsequent manual correction of the masks. We establish baseline instance segmentation results using two variations of the Mask2Former architecture, with the best performing model reaching an mAP of 89.15%. Additionally, we present two baseline length estimation methods, the best performing being a custom MobileNetV2-based regression model reaching an MAE of 0.62cm in images with no occlusion and 1.38cm in images with occlusion. Link to project page: https://vap.aau.dk/autofish/.
