Table of Contents
Fetching ...

Detecting Endangered Marine Species in Autonomous Underwater Vehicle Imagery Using Point Annotations and Few-Shot Learning

Heather Doig, Oscar Pizarro, Jacquomo Monk, Stefan Williams

TL;DR

The paper tackles detecting endangered handfish in Autonomous Underwater Vehicle imagery where annotations are scarce. It introduces a three-part framework that combines pre-training a detector backbone on six common base marine species with a two-direction copy-paste augmentation, using Segment Anything to convert point annotations into training boxes. The approach yields up to a 48% improvement in average precision with as few as 50 novel-class examples, demonstrating reduced domain shift and improved detection of rare species. The method is broadly applicable to other rare or cryptic underwater objects and can support more efficient, real-time monitoring by AUVs.

Abstract

One use of Autonomous Underwater Vehicles (AUVs) is the monitoring of habitats associated with threatened, endangered and protected marine species, such as the handfish of Tasmania, Australia. Seafloor imagery collected by AUVs can be used to identify individuals within their broader habitat context, but the sheer volume of imagery collected can overwhelm efforts to locate rare or cryptic individuals. Machine learning models can be used to identify the presence of a particular species in images using a trained object detector, but the lack of training examples reduces detection performance, particularly for rare species that may only have a small number of examples in the wild. In this paper, inspired by recent work in few-shot learning, images and annotations of common marine species are exploited to enhance the ability of the detector to identify rare and cryptic species. Annotated images of six common marine species are used in two ways. Firstly, the common species are used in a pre-training step to allow the backbone to create rich features for marine species. Secondly, a copy-paste operation is used with the common species images to augment the training data. While annotations for more common marine species are available in public datasets, they are often in point format, which is unsuitable for training an object detector. A popular semantic segmentation model efficiently generates bounding box annotations for training from the available point annotations. Our proposed framework is applied to AUV images of handfish, increasing average precision by up to 48\% compared to baseline object detection training. This approach can be applied to other objects with low numbers of annotations and promises to increase the ability to actively monitor threatened, endangered and protected species.

Detecting Endangered Marine Species in Autonomous Underwater Vehicle Imagery Using Point Annotations and Few-Shot Learning

TL;DR

The paper tackles detecting endangered handfish in Autonomous Underwater Vehicle imagery where annotations are scarce. It introduces a three-part framework that combines pre-training a detector backbone on six common base marine species with a two-direction copy-paste augmentation, using Segment Anything to convert point annotations into training boxes. The approach yields up to a 48% improvement in average precision with as few as 50 novel-class examples, demonstrating reduced domain shift and improved detection of rare species. The method is broadly applicable to other rare or cryptic underwater objects and can support more efficient, real-time monitoring by AUVs.

Abstract

One use of Autonomous Underwater Vehicles (AUVs) is the monitoring of habitats associated with threatened, endangered and protected marine species, such as the handfish of Tasmania, Australia. Seafloor imagery collected by AUVs can be used to identify individuals within their broader habitat context, but the sheer volume of imagery collected can overwhelm efforts to locate rare or cryptic individuals. Machine learning models can be used to identify the presence of a particular species in images using a trained object detector, but the lack of training examples reduces detection performance, particularly for rare species that may only have a small number of examples in the wild. In this paper, inspired by recent work in few-shot learning, images and annotations of common marine species are exploited to enhance the ability of the detector to identify rare and cryptic species. Annotated images of six common marine species are used in two ways. Firstly, the common species are used in a pre-training step to allow the backbone to create rich features for marine species. Secondly, a copy-paste operation is used with the common species images to augment the training data. While annotations for more common marine species are available in public datasets, they are often in point format, which is unsuitable for training an object detector. A popular semantic segmentation model efficiently generates bounding box annotations for training from the available point annotations. Our proposed framework is applied to AUV images of handfish, increasing average precision by up to 48\% compared to baseline object detection training. This approach can be applied to other objects with low numbers of annotations and promises to increase the ability to actively monitor threatened, endangered and protected species.
Paper Structure (15 sections, 5 figures, 1 table)

This paper contains 15 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Handfish in imagery captured by AUV Sirius and AUV Nimbus. Manually identifying a small and cryptic species like handfish against a complex background is time-consuming and can lead to missed observations. The top image shows a complete image from AUV Nimbus with a single handfish. There were 7 images with handfish identified out of 11,171 images captured during the mission. The bottom row shows cropped examples of handfish. Images downloaded from IMOS-UMI Squidle+, http://squidle.org.
  • Figure 2: Detector training for one-stage detector, FCOS. The pre-training step trains the object detector using a base class dataset. In the fine-tuning step, training begins with the pre-trained detector with the box classifier and box regressor layer replaced with a newly initialised layer that only detects the novel class (boxes in green). The first three layers of the backbone are frozen. The base class dataset is used in the copy-paste operation to augment the dataset during fine-tuning.
  • Figure 3: Examples of semantic segmentation boundaries generated from point annotations. The first row for each dataset (A, B, E, F) show successful segmentations. The row below (C, D, G, H) shows failed or poor-quality segmentation. (C) incorrectly includes the seafloor around the handfish, while (D) removes the distinctive hand-like fins of the handfish. Only errors in the handfish boundaries were manually corrected to ensure high-quality masks for the copy-paste operation.
  • Figure 4: Examples of the two-way copy-paste augmentation operation. Each pair of cropped images shows without (left) and with (right) copy-paste using the segmentation boundary as the mask. The top row (A, B) contains images with handfish with a base class instance added. The bottom row (C, D) is base class images with a handfish instance added.
  • Figure 5: Examples of the six common marine species and the number of annotations used in the base class dataset. The base class annotations are used in pre-training the backbone and for the two-way copy-paste augmentation operation.