Table of Contents
Fetching ...

A Framework for Multi-View Multiple Object Tracking using Single-View Multi-Object Trackers on Fish Data

Chaim Chai Elchik, Fatemeh Karimi Nejadasl, Seyed Sahand Mohammadi Ziabari, Ali Mohammed Mansoor Alsahag

TL;DR

This work tackles MOT for small, visually similar underwater fish by adapting state-of-the-art single-view trackers (FairMOT, YOLOv8) within a stereo multi-view framework to produce 3D outputs. It builds a pipeline that trains YOLOv8, tracks with ByteTrack, applies post-track re-identification, and performs stereo matching to triangulate 3D coordinates, enabling richer behavioral analysis. Evaluation with standard MOT metrics (HOTA, DetA, AssA, MOTA, IDF1) shows strong precision but limited recall in the single-view setting, while the multi-view framework yields depth information for a subset of tracks and reduces identity fragmentation through re-ID. The results demonstrate the feasibility of leveraging single-view MOT components to create a cross-view, 3D-aware tracking framework for underwater ecological studies, with clear directions for data, hardware, and methodological improvements to enhance generalization and robustness.

Abstract

Multi-object tracking (MOT) in computer vision has made significant advancements, yet tracking small fish in underwater environments presents unique challenges due to complex 3D motions and data noise. Traditional single-view MOT models often fall short in these settings. This thesis addresses these challenges by adapting state-of-the-art single-view MOT models, FairMOT and YOLOv8, for underwater fish detecting and tracking in ecological studies. The core contribution of this research is the development of a multi-view framework that utilizes stereo video inputs to enhance tracking accuracy and fish behavior pattern recognition. By integrating and evaluating these models on underwater fish video datasets, the study aims to demonstrate significant improvements in precision and reliability compared to single-view approaches. The proposed framework detects fish entities with a relative accuracy of 47% and employs stereo-matching techniques to produce a novel 3D output, providing a more comprehensive understanding of fish movements and interactions

A Framework for Multi-View Multiple Object Tracking using Single-View Multi-Object Trackers on Fish Data

TL;DR

This work tackles MOT for small, visually similar underwater fish by adapting state-of-the-art single-view trackers (FairMOT, YOLOv8) within a stereo multi-view framework to produce 3D outputs. It builds a pipeline that trains YOLOv8, tracks with ByteTrack, applies post-track re-identification, and performs stereo matching to triangulate 3D coordinates, enabling richer behavioral analysis. Evaluation with standard MOT metrics (HOTA, DetA, AssA, MOTA, IDF1) shows strong precision but limited recall in the single-view setting, while the multi-view framework yields depth information for a subset of tracks and reduces identity fragmentation through re-ID. The results demonstrate the feasibility of leveraging single-view MOT components to create a cross-view, 3D-aware tracking framework for underwater ecological studies, with clear directions for data, hardware, and methodological improvements to enhance generalization and robustness.

Abstract

Multi-object tracking (MOT) in computer vision has made significant advancements, yet tracking small fish in underwater environments presents unique challenges due to complex 3D motions and data noise. Traditional single-view MOT models often fall short in these settings. This thesis addresses these challenges by adapting state-of-the-art single-view MOT models, FairMOT and YOLOv8, for underwater fish detecting and tracking in ecological studies. The core contribution of this research is the development of a multi-view framework that utilizes stereo video inputs to enhance tracking accuracy and fish behavior pattern recognition. By integrating and evaluating these models on underwater fish video datasets, the study aims to demonstrate significant improvements in precision and reliability compared to single-view approaches. The proposed framework detects fish entities with a relative accuracy of 47% and employs stereo-matching techniques to produce a novel 3D output, providing a more comprehensive understanding of fish movements and interactions

Paper Structure

This paper contains 48 sections, 5 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: Model Output VS Ground Truth Identification for a Random Frame of Video 23_1
  • Figure 2: Fish margins of fish entity 7 in Video 23_1: The left image shows the YOLOv8 output and the right image shows the ground truth.
  • Figure 3: Missed Identification of fish entity 4 in Video 23_1: The left image shows the YOLOv8 output and the right image shows the ground truth.
  • Figure 4: False Positive Identification of fish entity in Video 10_2: The left image shows the YOLOv8 output and the right image shows the ground truth.
  • Figure 5: Tracking Before and After Re-Identification Video 129_1
  • ...and 14 more figures