Table of Contents
Fetching ...

Towards Phytoplankton Parasite Detection Using Autoencoders

Simon Bilik, Daniel Batrakhanov, Tuomas Eerola, Lumi Haraguchi, Kaisa Kraft, Silke Van den Wyngaert, Jonna Kangas, Conny Sjöqvist, Karin Madsen, Lasse Lensu, Heikki Kälviäinen, Karel Horak

TL;DR

The paper tackles the challenge of detecting phytoplankton parasites in imaging data, where anomalous examples are scarce. It proposes an unsupervised anomaly-detection pipeline based on a vector-quantized variational autoencoder (VQVAE) with HardNet feature extraction and Local Outlier Factor, and compares it to a supervised Faster R-CNN baseline. The approach achieves an average F1 score of 0.75 across nine plankton species, with per-species tuning offering further gains, while Faster R-CNN reaches 0.86 when trained on labeled anomalies. The work argues that the unsupervised method provides greater universality—detecting unknown anomalies without anomaly annotations—and supports scalable plankton monitoring, with code and data公开 available for the community.

Abstract

Phytoplankton parasites are largely understudied microbial components with a potentially significant ecological impact on phytoplankton bloom dynamics. To better understand their impact, we need improved detection methods to integrate phytoplankton parasite interactions in monitoring aquatic ecosystems. Automated imaging devices usually produce high amount of phytoplankton image data, while the occurrence of anomalous phytoplankton data is rare. Thus, we propose an unsupervised anomaly detection system based on the similarity of the original and autoencoder-reconstructed samples. With this approach, we were able to reach an overall F1 score of 0.75 in nine phytoplankton species, which could be further improved by species-specific fine-tuning. The proposed unsupervised approach was further compared with the supervised Faster R-CNN based object detector. With this supervised approach and the model trained on plankton species and anomalies, we were able to reach the highest F1 score of 0.86. However, the unsupervised approach is expected to be more universal as it can detect also unknown anomalies and it does not require any annotated anomalous data that may not be always available in sufficient quantities. Although other studies have dealt with plankton anomaly detection in terms of non-plankton particles, or air bubble detection, our paper is according to our best knowledge the first one which focuses on automated anomaly detection considering putative phytoplankton parasites or infections.

Towards Phytoplankton Parasite Detection Using Autoencoders

TL;DR

The paper tackles the challenge of detecting phytoplankton parasites in imaging data, where anomalous examples are scarce. It proposes an unsupervised anomaly-detection pipeline based on a vector-quantized variational autoencoder (VQVAE) with HardNet feature extraction and Local Outlier Factor, and compares it to a supervised Faster R-CNN baseline. The approach achieves an average F1 score of 0.75 across nine plankton species, with per-species tuning offering further gains, while Faster R-CNN reaches 0.86 when trained on labeled anomalies. The work argues that the unsupervised method provides greater universality—detecting unknown anomalies without anomaly annotations—and supports scalable plankton monitoring, with code and data公开 available for the community.

Abstract

Phytoplankton parasites are largely understudied microbial components with a potentially significant ecological impact on phytoplankton bloom dynamics. To better understand their impact, we need improved detection methods to integrate phytoplankton parasite interactions in monitoring aquatic ecosystems. Automated imaging devices usually produce high amount of phytoplankton image data, while the occurrence of anomalous phytoplankton data is rare. Thus, we propose an unsupervised anomaly detection system based on the similarity of the original and autoencoder-reconstructed samples. With this approach, we were able to reach an overall F1 score of 0.75 in nine phytoplankton species, which could be further improved by species-specific fine-tuning. The proposed unsupervised approach was further compared with the supervised Faster R-CNN based object detector. With this supervised approach and the model trained on plankton species and anomalies, we were able to reach the highest F1 score of 0.86. However, the unsupervised approach is expected to be more universal as it can detect also unknown anomalies and it does not require any annotated anomalous data that may not be always available in sufficient quantities. Although other studies have dealt with plankton anomaly detection in terms of non-plankton particles, or air bubble detection, our paper is according to our best knowledge the first one which focuses on automated anomaly detection considering putative phytoplankton parasites or infections.
Paper Structure (23 sections, 4 equations, 12 figures, 15 tables)

This paper contains 23 sections, 4 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Original (a), encoded space (b), reconstruction (c) and the difference image (d) of the Centrales plankton species's anomalous sample.
  • Figure 2: The proposed autoencoder-based anomaly detection pipeline.
  • Figure 3: Schemes of the modified autoencoder cores: (a) BAE1 core, (b) VAE2 core.
  • Figure 4: Example feature space of the Aphanizomenon plankton species.
  • Figure 5: Illustration of equal-error-rate (EER) threshold selection criterion on the ROC curve.
  • ...and 7 more figures