Table of Contents
Fetching ...

MPT: A Large-scale Multi-Phytoplankton Tracking Benchmark

Yang Yu, Yuezun Li, Xin Sun, Junyu Dong

TL;DR

This work addresses the challenge of real-time plankton monitoring by introducing MPT, a large-scale synthetic video benchmark with 140 4K sequences across 27 species and 14 backgrounds, enabling robust evaluation of multi-object tracking in underwater environments. It also presents DSFT, a Deviation-Corrected Multi-Scale Feature Fusion tracker that combines a residual-predicting auxiliary extractor (DCM) with multi-scale feature fusion (MFSF) to mitigate focus shifts and the loss of small-object information during tracking. The authors validate MPT and DSFT through extensive experiments and ablations, showing substantial improvements over baselines and establishing a practical framework for real-time phytoplankton observation and monitoring. Overall, MPT provides a versatile resource bridging detection and tracking in marine contexts, while DSFT offers a specialized online MOT solution tailored to the unique challenges of plankton data and underwater backgrounds.

Abstract

Phytoplankton are a crucial component of aquatic ecosystems, and effective monitoring of them can provide valuable insights into ocean environments and ecosystem changes. Traditional phytoplankton monitoring methods are often complex and lack timely analysis. Therefore, deep learning algorithms offer a promising approach for automated phytoplankton monitoring. However, the lack of large-scale, high-quality training samples has become a major bottleneck in advancing phytoplankton tracking. In this paper, we propose a challenging benchmark dataset, Multiple Phytoplankton Tracking (MPT), which covers diverse background information and variations in motion during observation. The dataset includes 27 species of phytoplankton and zooplankton, 14 different backgrounds to simulate diverse and complex underwater environments, and a total of 140 videos. To enable accurate real-time observation of phytoplankton, we introduce a multi-object tracking method, Deviation-Corrected Multi-Scale Feature Fusion Tracker(DSFT), which addresses issues such as focus shifts during tracking and the loss of small target information when computing frame-to-frame similarity. Specifically, we introduce an additional feature extractor to predict the residuals of the standard feature extractor's output, and compute multi-scale frame-to-frame similarity based on features from different layers of the extractor. Extensive experiments on the MPT have demonstrated the validity of the dataset and the superiority of DSFT in tracking phytoplankton, providing an effective solution for phytoplankton monitoring.

MPT: A Large-scale Multi-Phytoplankton Tracking Benchmark

TL;DR

This work addresses the challenge of real-time plankton monitoring by introducing MPT, a large-scale synthetic video benchmark with 140 4K sequences across 27 species and 14 backgrounds, enabling robust evaluation of multi-object tracking in underwater environments. It also presents DSFT, a Deviation-Corrected Multi-Scale Feature Fusion tracker that combines a residual-predicting auxiliary extractor (DCM) with multi-scale feature fusion (MFSF) to mitigate focus shifts and the loss of small-object information during tracking. The authors validate MPT and DSFT through extensive experiments and ablations, showing substantial improvements over baselines and establishing a practical framework for real-time phytoplankton observation and monitoring. Overall, MPT provides a versatile resource bridging detection and tracking in marine contexts, while DSFT offers a specialized online MOT solution tailored to the unique challenges of plankton data and underwater backgrounds.

Abstract

Phytoplankton are a crucial component of aquatic ecosystems, and effective monitoring of them can provide valuable insights into ocean environments and ecosystem changes. Traditional phytoplankton monitoring methods are often complex and lack timely analysis. Therefore, deep learning algorithms offer a promising approach for automated phytoplankton monitoring. However, the lack of large-scale, high-quality training samples has become a major bottleneck in advancing phytoplankton tracking. In this paper, we propose a challenging benchmark dataset, Multiple Phytoplankton Tracking (MPT), which covers diverse background information and variations in motion during observation. The dataset includes 27 species of phytoplankton and zooplankton, 14 different backgrounds to simulate diverse and complex underwater environments, and a total of 140 videos. To enable accurate real-time observation of phytoplankton, we introduce a multi-object tracking method, Deviation-Corrected Multi-Scale Feature Fusion Tracker(DSFT), which addresses issues such as focus shifts during tracking and the loss of small target information when computing frame-to-frame similarity. Specifically, we introduce an additional feature extractor to predict the residuals of the standard feature extractor's output, and compute multi-scale frame-to-frame similarity based on features from different layers of the extractor. Extensive experiments on the MPT have demonstrated the validity of the dataset and the superiority of DSFT in tracking phytoplankton, providing an effective solution for phytoplankton monitoring.

Paper Structure

This paper contains 21 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: A dataset with 14 different background images. There are variations in brightness and impurity levels under white and blue backgrounds.
  • Figure 2: In real-world scenarios, overlapping phytoplankton individuals can lead to difficulties in identification (left image); different species of phytoplankton also exhibit significant size differences (right image).
  • Figure 3: Overview of DSFT. The lines of different colors in the figure are not overlapping at the connection points. The red arrow represents the deepest feature map output by Conv1, the green arrow represents the intermediate feature map output by Conv1, and the blue arrow represents the shallow feature map output by Conv1. The gold arrow indicates the detection result generated from the head of the previous frame, which contains only positional information and no class information.
  • Figure 4: Flowchart of the DCM section. In the figure, the connection points of lines with different colors do not overlap. The red, green, and blue arrows represent the outputs from different layers of the Conv1 module.