Table of Contents
Fetching ...

Livestock Fish Larvae Counting using DETR and YOLO based Deep Networks

Daniel Ortega de Carvalho, Luiz Felipe Teodoro Monteiro, Fernanda Marques Bazilio, Gabriel Toshio Hirokawa Higa, Hemerson Pistori

TL;DR

This study tackles automated fish larvae counting by comparing transformer- and CNN-based detectors (YOLOv8 variants, RT-DETR, DETR-ResNet-50, Deformable DETR) on a new realistic smartphone image dataset (162 images of spotted sorubim and dourado). Image tiling is employed to handle high-resolution imagery and hardware constraints, with two tiling schemes and a 60%-area retention rule for annotations crossing fragments. The best results achieve a mean absolute percentage error (MAPE) as low as $MAPE=4.46\%$ with RT-DETR and $MAPE=4.71\%$ with a medium YOLOv8, supported by an $R^2$ around $0.98$ for top models. The work demonstrates the practicality of real-world larval counting under realistic data collection conditions and highlights avenues for dataset expansion and non-sampling counting approaches in aquaculture.

Abstract

Counting fish larvae is an important, yet demanding and time consuming, task in aquaculture. In order to address this problem, in this work, we evaluate four neural network architectures, including convolutional neural networks and transformers, in different sizes, in the task of fish larvae counting. For the evaluation, we present a new annotated image dataset with less data collection requirements than preceding works, with images of spotted sorubim and dourado larvae. By using image tiling techniques, we achieve a MAPE of 4.46% ($\pm 4.70$) with an extra large real time detection transformer, and 4.71% ($\pm 4.98$) with a medium-sized YOLOv8.

Livestock Fish Larvae Counting using DETR and YOLO based Deep Networks

TL;DR

This study tackles automated fish larvae counting by comparing transformer- and CNN-based detectors (YOLOv8 variants, RT-DETR, DETR-ResNet-50, Deformable DETR) on a new realistic smartphone image dataset (162 images of spotted sorubim and dourado). Image tiling is employed to handle high-resolution imagery and hardware constraints, with two tiling schemes and a 60%-area retention rule for annotations crossing fragments. The best results achieve a mean absolute percentage error (MAPE) as low as with RT-DETR and with a medium YOLOv8, supported by an around for top models. The work demonstrates the practicality of real-world larval counting under realistic data collection conditions and highlights avenues for dataset expansion and non-sampling counting approaches in aquaculture.

Abstract

Counting fish larvae is an important, yet demanding and time consuming, task in aquaculture. In order to address this problem, in this work, we evaluate four neural network architectures, including convolutional neural networks and transformers, in different sizes, in the task of fish larvae counting. For the evaluation, we present a new annotated image dataset with less data collection requirements than preceding works, with images of spotted sorubim and dourado larvae. By using image tiling techniques, we achieve a MAPE of 4.46% () with an extra large real time detection transformer, and 4.71% () with a medium-sized YOLOv8.
Paper Structure (10 sections, 7 figures, 3 tables)

This paper contains 10 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: An illustration of the sampling process used to obtain the larva bowls that were photographed, along with some of the bowls available for this purpose and utilized in this study.
  • Figure 2: Some of the images that constitute the full dataset. One should notice that, since there was no control for obtaining images in ideal and standardized conditions, the images in the dataset present some degree of variability in lighting conditions and background, as well as dirt and the dark eyes of streaked prochilod larvae.
  • Figure 3: Close-up examples of the larvae used in this study. Streaked prochilods were not counting targets in this study, and it is not intuitively obvious that the same techniques used here will have the same efficacy in counting them as they have in counting spotted sorubim and dourado larvae. Nonetheless, it should be clear from this example that they can be treated as noise.
  • Figure 4: An example of fixed-size tiling applied on an image of the dataset. Another strategy used in this work was tiling according to a scale factor.
  • Figure 5: Examples of larvae counting with YOLOv8n. In these images, the percentage errors achieved were 5% and 2.41%.
  • ...and 2 more figures