Table of Contents
Fetching ...

Self-Supervised Pretraining for Fine-Grained Plankton Recognition

Joona Kareinen, Tuomas Eerola, Kaisa Kraft, Lasse Lensu, Sanna Suikkanen, Heikki Kälviäinen

TL;DR

This work tackles the challenge of fine-grained plankton recognition under dataset shifts caused by diverse imaging instruments and evolving taxonomies. It adopts Masked Autoencoder (MAE) self-supervised pretraining on a large, heterogeneous plankton dataset, followed by supervised fine-tuning with limited labeled data. By comparing pretraining on ImageNet, diverse plankton data with and without target-domain content, the study shows domain-specific SSL markedly improves performance in low-label regimes and when unlabeled target data is accessible during pretraining. The contributions include the first MAE application to plankton data, a thorough evaluation of pretraining strategies, and publicly available pretrained models, which collectively reduce labeling requirements for ecological monitoring and enable better cross-dataset generalization.

Abstract

Plankton recognition is an important computer vision problem due to plankton's essential role in ocean food webs and carbon capture, highlighting the need for species-level monitoring. However, this task is challenging due to its fine-grained nature and dataset shifts caused by different imaging instruments and varying species distributions. As new plankton image datasets are collected at an increasing pace, there is a need for general plankton recognition models that require minimal expert effort for data labeling. In this work, we study large-scale self-supervised pretraining for fine-grained plankton recognition. We first employ masked autoencoding and a large volume of diverse plankton image data to pretrain a general-purpose plankton image encoder. Then we utilize fine-tuning to obtain accurate plankton recognition models for new datasets with a very limited number of labeled training images. Our experiments show that self-supervised pretraining with diverse plankton data clearly increases plankton recognition accuracy compared to standard ImageNet pretraining when the amount of training data is limited. Moreover, the accuracy can be further improved when unlabeled target data is available and utilized during the pretraining.

Self-Supervised Pretraining for Fine-Grained Plankton Recognition

TL;DR

This work tackles the challenge of fine-grained plankton recognition under dataset shifts caused by diverse imaging instruments and evolving taxonomies. It adopts Masked Autoencoder (MAE) self-supervised pretraining on a large, heterogeneous plankton dataset, followed by supervised fine-tuning with limited labeled data. By comparing pretraining on ImageNet, diverse plankton data with and without target-domain content, the study shows domain-specific SSL markedly improves performance in low-label regimes and when unlabeled target data is accessible during pretraining. The contributions include the first MAE application to plankton data, a thorough evaluation of pretraining strategies, and publicly available pretrained models, which collectively reduce labeling requirements for ecological monitoring and enable better cross-dataset generalization.

Abstract

Plankton recognition is an important computer vision problem due to plankton's essential role in ocean food webs and carbon capture, highlighting the need for species-level monitoring. However, this task is challenging due to its fine-grained nature and dataset shifts caused by different imaging instruments and varying species distributions. As new plankton image datasets are collected at an increasing pace, there is a need for general plankton recognition models that require minimal expert effort for data labeling. In this work, we study large-scale self-supervised pretraining for fine-grained plankton recognition. We first employ masked autoencoding and a large volume of diverse plankton image data to pretrain a general-purpose plankton image encoder. Then we utilize fine-tuning to obtain accurate plankton recognition models for new datasets with a very limited number of labeled training images. Our experiments show that self-supervised pretraining with diverse plankton data clearly increases plankton recognition accuracy compared to standard ImageNet pretraining when the amount of training data is limited. Moreover, the accuracy can be further improved when unlabeled target data is available and utilized during the pretraining.

Paper Structure

This paper contains 15 sections, 1 equation, 6 figures, 7 tables.

Figures (6)

  • Figure 1: We evaluate three different self-supervised pretraining strategies: (a) pretraining on ImageNet-1k and fine-tuning on plankton data, (b) pretraining on a diverse plankton dataset and fine-tuning on unseen plankton data, and (c) pretraining on a diverse plankton dataset and fine-tuning on a subset of the same dataset.
  • Figure 2: Example plankton images from different datasets: a) Kaggle-Plankton Cowen2015, b) Lake Zooplankton kyathanahally2021deep, c) SYKE-Plankton-ZooScan_2024 zooscan2024, d) PMID2019 li2020developing, e) SYKE-Plankton-IFCB_2022 syke2022, f) UDE Diatoms in the Wild 2024 Kloster2024, g) DAPlankton batrakhanov2024daplankton. The shown images are taken from three different, visually similar classes within each dataset, highlighting the fine-grained nature of the data. For DAPlankton, the same three classes are shown across all instruments.
  • Figure 3: Masked autoencoder learns from the small unmasked patches (left) to reconstruct (middle) the original plankton image (right).
  • Figure 4: Mean accuracy and standard deviation for DAPlankton$_{\textrm{LAB}}$ across different labeled data subsets.
  • Figure 5: Confusion matrices for ViT-L (ImageNet) and ViT-L (no-daplankton), evaluated on 1% of labeled FC data from DAPlankton$_\textrm{LAB}$.
  • ...and 1 more figures