Table of Contents
Fetching ...

DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation

Daniel Batrakhanov, Tuomas Eerola, Kaisa Kraft, Lumi Haraguchi, Lasse Lensu, Sanna Suikkanen, María Teresa Camarena-Gómez, Jukka Seppälä, Heikki Kälviäinen

TL;DR

DAPlankton addresses domain shift in plankton recognition caused by different imaging instruments by introducing a benchmark with two subsets, DAPlankton_LAB and DAPlankton_SEA, capturing cultured and natural Baltic Sea data across multiple instruments. The authors provide an evaluation protocol for unsupervised closed-set domain adaptation and report a preliminary benchmark of three baseline methods (Deep CORAL, CDAN, Deep MEDA) using AlexNet and ResNet-18. Findings show that existing DA methods improve over non-adaptive baselines but struggle with fine-grained, imbalanced, multi-instrument plankton data, highlighting the need for novel approaches. The dataset, publicly released, enables reproducible benchmarking and motivates the development of methods capable of robust cross-instrument plankton recognition.

Abstract

Plankton recognition provides novel possibilities to study various environmental aspects and an interesting real-world context to develop domain adaptation (DA) methods. Different imaging instruments cause domain shift between datasets hampering the development of general plankton recognition methods. A promising remedy for this is DA allowing to adapt a model trained on one instrument to other instruments. In this paper, we present a new DA dataset called DAPlankton which consists of phytoplankton images obtained with different instruments. Phytoplankton provides a challenging DA problem due to the fine-grained nature of the task and high class imbalance in real-world datasets. DAPlankton consists of two subsets. DAPlankton_LAB contains images of cultured phytoplankton providing a balanced dataset with minimal label uncertainty. DAPlankton_SEA consists of images collected from the Baltic Sea providing challenging real-world data with large intra-class variance and class imbalance. We further present a benchmark comparison of three widely used DA methods.

DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation

TL;DR

DAPlankton addresses domain shift in plankton recognition caused by different imaging instruments by introducing a benchmark with two subsets, DAPlankton_LAB and DAPlankton_SEA, capturing cultured and natural Baltic Sea data across multiple instruments. The authors provide an evaluation protocol for unsupervised closed-set domain adaptation and report a preliminary benchmark of three baseline methods (Deep CORAL, CDAN, Deep MEDA) using AlexNet and ResNet-18. Findings show that existing DA methods improve over non-adaptive baselines but struggle with fine-grained, imbalanced, multi-instrument plankton data, highlighting the need for novel approaches. The dataset, publicly released, enables reproducible benchmarking and motivates the development of methods capable of robust cross-instrument plankton recognition.

Abstract

Plankton recognition provides novel possibilities to study various environmental aspects and an interesting real-world context to develop domain adaptation (DA) methods. Different imaging instruments cause domain shift between datasets hampering the development of general plankton recognition methods. A promising remedy for this is DA allowing to adapt a model trained on one instrument to other instruments. In this paper, we present a new DA dataset called DAPlankton which consists of phytoplankton images obtained with different instruments. Phytoplankton provides a challenging DA problem due to the fine-grained nature of the task and high class imbalance in real-world datasets. DAPlankton consists of two subsets. DAPlankton_LAB contains images of cultured phytoplankton providing a balanced dataset with minimal label uncertainty. DAPlankton_SEA consists of images collected from the Baltic Sea providing challenging real-world data with large intra-class variance and class imbalance. We further present a benchmark comparison of three widely used DA methods.
Paper Structure (21 sections, 1 equation, 3 figures, 4 tables)

This paper contains 21 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Phytoplankton images from DAPlankton$_{LAB}$. Each column contains a different imaging instrument (domain) and each row a different phytoplankton species.
  • Figure 2: Example images from DAPlankton$_\mathrm{LAB}$. Domains from left to right: CytoSense (CS), FlowCam (FC), and Imaging FlowCytobot (IFCB).
  • Figure 3: Example images from the DAPlankton$_\mathrm{SEA}$ dataset. Notice the large intra-class variation: (a) Chaetoceros sp.; (b) Chlorococcales; (c) Nodularia spumigena.