Table of Contents
Fetching ...

AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish

Stefan Hein Bengtson, Daniel Lehotský, Vasiliki Ismiroglou, Niels Madsen, Thomas B. Moeslund, Malte Pedersen

TL;DR

AutoFish tackles automated, fine-grained catch documentation to support sustainable fisheries by introducing a public dataset of 1,500 RGB images of 454 fish with per-fish IDs, instance segmentation, and length measurements collected on a controlled conveyor belt. The authors establish baseline instance-segmentation (Mask2Former with Swin-B) and length-estimation methods (skeletonization and MobileNetV2 regression), reporting $mAP$ around $89.15\%$ and $MAE$ in the sub-centimeter range for non-occluded cases and higher errors under occlusion. The work demonstrates the feasibility of automated fish documentation, provides extensive annotations in COCO format, and discusses leveraging IDs for fish-level analyses and potential re-identification to improve accuracy. These contributions enable scalable, transparent monitoring in fisheries and form a data-rich foundation for future research in automated catch documentation and per-fish tracking.

Abstract

Automated fish documentation processes are in the near future expected to play an essential role in sustainable fisheries management and for addressing challenges of overfishing. In this paper, we present a novel and publicly available dataset named AutoFish designed for fine-grained fish analysis. The dataset comprises 1,500 images of 454 specimens of visually similar fish placed in various constellations on a white conveyor belt and annotated with instance segmentation masks, IDs, and length measurements. The data was collected in a controlled environment using an RGB camera. The annotation procedure involved manual point annotations, initial segmentation masks proposed by the Segment Anything Model (SAM), and subsequent manual correction of the masks. We establish baseline instance segmentation results using two variations of the Mask2Former architecture, with the best performing model reaching an mAP of 89.15%. Additionally, we present two baseline length estimation methods, the best performing being a custom MobileNetV2-based regression model reaching an MAE of 0.62cm in images with no occlusion and 1.38cm in images with occlusion. Link to project page: https://vap.aau.dk/autofish/.

AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish

TL;DR

AutoFish tackles automated, fine-grained catch documentation to support sustainable fisheries by introducing a public dataset of 1,500 RGB images of 454 fish with per-fish IDs, instance segmentation, and length measurements collected on a controlled conveyor belt. The authors establish baseline instance-segmentation (Mask2Former with Swin-B) and length-estimation methods (skeletonization and MobileNetV2 regression), reporting around and in the sub-centimeter range for non-occluded cases and higher errors under occlusion. The work demonstrates the feasibility of automated fish documentation, provides extensive annotations in COCO format, and discusses leveraging IDs for fish-level analyses and potential re-identification to improve accuracy. These contributions enable scalable, transparent monitoring in fisheries and form a data-rich foundation for future research in automated catch documentation and per-fish tracking.

Abstract

Automated fish documentation processes are in the near future expected to play an essential role in sustainable fisheries management and for addressing challenges of overfishing. In this paper, we present a novel and publicly available dataset named AutoFish designed for fine-grained fish analysis. The dataset comprises 1,500 images of 454 specimens of visually similar fish placed in various constellations on a white conveyor belt and annotated with instance segmentation masks, IDs, and length measurements. The data was collected in a controlled environment using an RGB camera. The annotation procedure involved manual point annotations, initial segmentation masks proposed by the Segment Anything Model (SAM), and subsequent manual correction of the masks. We establish baseline instance segmentation results using two variations of the Mask2Former architecture, with the best performing model reaching an mAP of 89.15%. Additionally, we present two baseline length estimation methods, the best performing being a custom MobileNetV2-based regression model reaching an MAE of 0.62cm in images with no occlusion and 1.38cm in images with occlusion. Link to project page: https://vap.aau.dk/autofish/.
Paper Structure (19 sections, 10 figures, 4 tables)

This paper contains 19 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Illustration of the recording setup and an example image from the AutoFish dataset with an overlay of groundtruth bounding boxes, instance segmentations, IDs, and lengths.
  • Figure 2: The distribution of species in the AutoFish dataset. The members of the true cod family are highlighted with a black border. The numbers inside the chart indicate the number of specimens. The average length is indicated for each of the species above the image examples.
  • Figure 3: The AutoFish dataset contains 25 groups of fish. Each group consists of three subsets of images, namely, Set1 and Set2, which contain one half of the fish each, and All, which contains all of the group's fish.
  • Figure 4: The central line is identified by fitting a polynomial (orange) to the skeleton of the mask (green). Secondly, the polynomial is evaluated based on the convex hull of the mask (blue) to handle forked caudal fins and occlusions.
  • Figure 5: Overview of the CNN-based regression model (REG).
  • ...and 5 more figures