Livestock Fish Larvae Counting using DETR and YOLO based Deep Networks
Daniel Ortega de Carvalho, Luiz Felipe Teodoro Monteiro, Fernanda Marques Bazilio, Gabriel Toshio Hirokawa Higa, Hemerson Pistori
TL;DR
This study tackles automated fish larvae counting by comparing transformer- and CNN-based detectors (YOLOv8 variants, RT-DETR, DETR-ResNet-50, Deformable DETR) on a new realistic smartphone image dataset (162 images of spotted sorubim and dourado). Image tiling is employed to handle high-resolution imagery and hardware constraints, with two tiling schemes and a 60%-area retention rule for annotations crossing fragments. The best results achieve a mean absolute percentage error (MAPE) as low as $MAPE=4.46\%$ with RT-DETR and $MAPE=4.71\%$ with a medium YOLOv8, supported by an $R^2$ around $0.98$ for top models. The work demonstrates the practicality of real-world larval counting under realistic data collection conditions and highlights avenues for dataset expansion and non-sampling counting approaches in aquaculture.
Abstract
Counting fish larvae is an important, yet demanding and time consuming, task in aquaculture. In order to address this problem, in this work, we evaluate four neural network architectures, including convolutional neural networks and transformers, in different sizes, in the task of fish larvae counting. For the evaluation, we present a new annotated image dataset with less data collection requirements than preceding works, with images of spotted sorubim and dourado larvae. By using image tiling techniques, we achieve a MAPE of 4.46% ($\pm 4.70$) with an extra large real time detection transformer, and 4.71% ($\pm 4.98$) with a medium-sized YOLOv8.
