HideAndSeg: an AI-based tool with automated prompting for octopus segmentation in natural habitats
Alan de Aguiar, Michaella Pereira Andrade, Charles Morphy D. Santos, João Paulo Gois
TL;DR
HideAndSeg tackles the challenge of segmenting octopuses in natural underwater videos by integrating SAM2 with a specialized YOLOv11 detector and introducing unsupervised metrics $DICE_t$ and $NC_t$ to guide mask quality without ground-truth labels. The method starts with minimal manual annotation to seed SAM2, trains YOLO on resulting segmentation boxes, and then uses YOLO detections to automate SAM2 prompts for full video segmentation. Results show high temporal consistency ($DICE_t$ ~ 0.97) and low fragmentation ($NC_t$ ~ 2.2), with YOLO achieving strong detection metrics (mAP@50 ≈ 0.971, mAP@50–95 ≈ 0.872), indicating robust performance under camouflage and occlusion. The approach enables scalable, automated behavioral analysis of wild cephalopods and suggests a path toward applying similar unsupervised-guided prompts to other wildlife in challenging habitats.
Abstract
Analyzing octopuses in their natural habitats is challenging due to their camouflage capability, rapid changes in skin texture and color, non-rigid body deformations, and frequent occlusions, all of which are compounded by variable underwater lighting and turbidity. Addressing the lack of large-scale annotated datasets, this paper introduces HideAndSeg, a novel, minimally supervised AI-based tool for segmenting videos of octopuses. It establishes a quantitative baseline for this task. HideAndSeg integrates SAM2 with a custom-trained YOLOv11 object detector. First, the user provides point coordinates to generate the initial segmentation masks with SAM2. These masks serve as training data for the YOLO model. After that, our approach fully automates the pipeline by providing a bounding box prompt to SAM2, eliminating the need for further manual intervention. We introduce two unsupervised metrics - temporal consistency $DICE_t$ and new component count $NC_t$ - to quantitatively evaluate segmentation quality and guide mask refinement in the absence of ground-truth data, i.e., real-world information that serves to train, validate, and test AI models. Results show that HideAndSeg achieves satisfactory performance, reducing segmentation noise compared to the manually prompted approach. Our method can re-identify and segment the octopus even after periods of complete occlusion in natural environments, a scenario in which the manually prompted model fails. By reducing the need for manual analysis in real-world scenarios, this work provides a practical tool that paves the way for more efficient behavioral studies of wild cephalopods.
