FathomVerse: A community science dataset for ocean animal discovery
Genevieve Patterson, Joost Daniels, Benjamin Woodward, Kevin Barnard, Giovanna Sainz, Lonny Lundsten, Kakani Katija
TL;DR
FathomVerse addresses the challenge of deep-sea animal discovery by combining a community-science game with expert-backed consensus labeling to build FathomVerse v0, a 3843-image dataset with 8092 bounding boxes across 12 morph groups from two new deep-sea locations. The approach leverages player annotations filtered by F1-score to produce reliable labels and evaluates detector performance using both FathomNet-derived and FathomVerse-specific models. Key contributions include a scalable annotation pipeline for hard-to-identify benthic fauna, analysis of player performance and annotation reliability, and baseline detectors highlighting the need for architectural advances to handle deep-sea visual variability. This work lays groundwork for improved fine-grained transfer learning, novel category discovery, and conservation-relevant analyses in ocean science, while outlining practical paths to expand categories and background contextualization in future iterations.
Abstract
Can computer vision help us explore the ocean? The ultimate challenge for computer vision is to recognize any visual phenomena, more than only the objects and animals humans encounter in their terrestrial lives. Previous datasets have explored everyday objects and fine-grained categories humans see frequently. We present the FathomVerse v0 detection dataset to push the limits of our field by exploring animals that rarely come in contact with people in the deep sea. These animals present a novel vision challenge. The FathomVerse v0 dataset consists of 3843 images with 8092 bounding boxes from 12 distinct morphological groups recorded at two locations on the deep seafloor that are new to computer vision. It features visually perplexing scenarios such as an octopus intertwined with a sea star, and confounding categories like vampire squids and sea spiders. This dataset can push forward research on topics like fine-grained transfer learning, novel category discovery, species distribution modeling, and carbon cycle analysis, all of which are important to the care and husbandry of our planet.
