AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish

Stefan Hein Bengtson; Daniel Lehotský; Vasiliki Ismiroglou; Niels Madsen; Thomas B. Moeslund; Malte Pedersen

AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish

Stefan Hein Bengtson, Daniel Lehotský, Vasiliki Ismiroglou, Niels Madsen, Thomas B. Moeslund, Malte Pedersen

TL;DR

AutoFish tackles automated, fine-grained catch documentation to support sustainable fisheries by introducing a public dataset of 1,500 RGB images of 454 fish with per-fish IDs, instance segmentation, and length measurements collected on a controlled conveyor belt. The authors establish baseline instance-segmentation (Mask2Former with Swin-B) and length-estimation methods (skeletonization and MobileNetV2 regression), reporting $mAP$ around $89.15\%$ and $MAE$ in the sub-centimeter range for non-occluded cases and higher errors under occlusion. The work demonstrates the feasibility of automated fish documentation, provides extensive annotations in COCO format, and discusses leveraging IDs for fish-level analyses and potential re-identification to improve accuracy. These contributions enable scalable, transparent monitoring in fisheries and form a data-rich foundation for future research in automated catch documentation and per-fish tracking.

Abstract

Automated fish documentation processes are in the near future expected to play an essential role in sustainable fisheries management and for addressing challenges of overfishing. In this paper, we present a novel and publicly available dataset named AutoFish designed for fine-grained fish analysis. The dataset comprises 1,500 images of 454 specimens of visually similar fish placed in various constellations on a white conveyor belt and annotated with instance segmentation masks, IDs, and length measurements. The data was collected in a controlled environment using an RGB camera. The annotation procedure involved manual point annotations, initial segmentation masks proposed by the Segment Anything Model (SAM), and subsequent manual correction of the masks. We establish baseline instance segmentation results using two variations of the Mask2Former architecture, with the best performing model reaching an mAP of 89.15%. Additionally, we present two baseline length estimation methods, the best performing being a custom MobileNetV2-based regression model reaching an MAE of 0.62cm in images with no occlusion and 1.38cm in images with occlusion. Link to project page: https://vap.aau.dk/autofish/.

AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish

TL;DR

around

and

in the sub-centimeter range for non-occluded cases and higher errors under occlusion. The work demonstrates the feasibility of automated fish documentation, provides extensive annotations in COCO format, and discusses leveraging IDs for fish-level analyses and potential re-identification to improve accuracy. These contributions enable scalable, transparent monitoring in fisheries and form a data-rich foundation for future research in automated catch documentation and per-fish tracking.

Abstract

Paper Structure (19 sections, 10 figures, 4 tables)

This paper contains 19 sections, 10 figures, 4 tables.

Introduction
Related work
Dataset
Fish composition
Camera setup
Image collection
Annotation procedure
Methods
Instance segmentation
Length estimation
Mask skeletonization (SKL)
CNN-based length regression (REG)
Results
Instance segmentation
Length estimation
...and 4 more sections

Figures (10)

Figure 1: Illustration of the recording setup and an example image from the AutoFish dataset with an overlay of groundtruth bounding boxes, instance segmentations, IDs, and lengths.
Figure 2: The distribution of species in the AutoFish dataset. The members of the true cod family are highlighted with a black border. The numbers inside the chart indicate the number of specimens. The average length is indicated for each of the species above the image examples.
Figure 3: The AutoFish dataset contains 25 groups of fish. Each group consists of three subsets of images, namely, Set1 and Set2, which contain one half of the fish each, and All, which contains all of the group's fish.
Figure 4: The central line is identified by fitting a polynomial (orange) to the skeleton of the mask (green). Secondly, the polynomial is evaluated based on the convex hull of the mask (blue) to handle forked caudal fins and occlusions.
Figure 5: Overview of the CNN-based regression model (REG).
...and 5 more figures

AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish

TL;DR

Abstract

AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish

Authors

TL;DR

Abstract

Table of Contents

Figures (10)