CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
Hongyong Han, Wei Wang, Gaowei Zhang, Mingjie Li, Yi Wang
TL;DR
CoralVQA introduces the first large-scale, genus-level VQA dataset for coral reef images (12,805 images, 277,653 QA pairs across 16 dimensions) assembled via a six-stage pipeline with marine-biologist collaboration. It provides a rigorous benchmark by evaluating multiple LVLMs across standard, cross-region, and bleaching-coverage tasks, revealing substantial gaps in domain knowledge, regional generalization, and complex ecological reasoning. Key findings show open-ended questions and morphology-related tasks are particularly challenging, highlighting the need to integrate marine biology knowledge into vision-language systems for effective coral conservation support. The dataset and framework offer a foundation for developing specialized LVLMs to assist conservation practitioners and educators, with clear directions for improving domain-specific VQA capabilities.
Abstract
Coral reefs are vital yet vulnerable ecosystems that require continuous monitoring to support conservation. While coral reef images provide essential information in coral monitoring, interpreting such images remains challenging due to the need for domain expertise. Visual Question Answering (VQA), powered by Large Vision-Language Models (LVLMs), has great potential in user-friendly interaction with coral reef images. However, applying VQA to coral imagery demands a dedicated dataset that addresses two key challenges: domain-specific annotations and multidimensional questions. In this work, we introduce CoralVQA, the first large-scale VQA dataset for coral reef analysis. It contains 12,805 real-world coral images from 67 coral genera collected from 3 oceans, along with 277,653 question-answer pairs that comprehensively assess ecological and health-related conditions. To construct this dataset, we develop a semi-automatic data construction pipeline in collaboration with marine biologists to ensure both scalability and professional-grade data quality. CoralVQA presents novel challenges and provides a comprehensive benchmark for studying vision-language reasoning in the context of coral reef images. By evaluating several state-of-the-art LVLMs, we reveal key limitations and opportunities. These insights form a foundation for future LVLM development, with a particular emphasis on supporting coral conservation efforts.
