Table of Contents
Fetching ...

Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case Study

Ziqiang Zheng, Yiwei Chen, Jipeng Zhang, Tuan-Anh Vu, Huimin Zeng, Yue Him Wong Tim, Sai-Kit Yeung

TL;DR

This study conducts a systematic evaluation of GPT-4V for marine analysis, spanning perception, statistics, domain-specific question answering, marine culture, advanced functions, and prompt engineering. It finds that GPT-4V exhibits strong OCR and general visual comprehension but struggles with fine-grained object recognition, precise counting, and full domain-specific reasoning without external tools. The results underscore substantial gaps between current MLLMs and professional marine expertise, while providing a structured benchmark and actionable insights for data, prompts, and tool integration in domain sciences. Overall, the work offers a rigorous baseline and guidance for future development of multimodal models in specialized scientific domains.

Abstract

Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant. The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals. The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities. GPT-4V(ison) has demonstrated significant power in both academia and industry fields, as a focal point in a new artificial intelligence generation. Though significant success was achieved by GPT-4V, exploring MLLMs in domain-specific analysis (e.g., marine analysis) that required domain-specific knowledge and expertise has gained less attention. In this study, we carry out the preliminary and comprehensive case study of utilizing GPT-4V for marine analysis. This report conducts a systematic evaluation of existing GPT-4V, assessing the performance of GPT-4V on marine research and also setting a new standard for future developments in MLLMs. The experimental results of GPT-4V show that the responses generated by GPT-4V are still far away from satisfying the domain-specific requirements of the marine professions. All images and prompts used in this study will be available at https://github.com/hkust-vgd/Marine_GPT-4V_Eval

Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case Study

TL;DR

This study conducts a systematic evaluation of GPT-4V for marine analysis, spanning perception, statistics, domain-specific question answering, marine culture, advanced functions, and prompt engineering. It finds that GPT-4V exhibits strong OCR and general visual comprehension but struggles with fine-grained object recognition, precise counting, and full domain-specific reasoning without external tools. The results underscore substantial gaps between current MLLMs and professional marine expertise, while providing a structured benchmark and actionable insights for data, prompts, and tool integration in domain sciences. Overall, the work offers a rigorous baseline and guidance for future development of multimodal models in specialized scientific domains.

Abstract

Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant. The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals. The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities. GPT-4V(ison) has demonstrated significant power in both academia and industry fields, as a focal point in a new artificial intelligence generation. Though significant success was achieved by GPT-4V, exploring MLLMs in domain-specific analysis (e.g., marine analysis) that required domain-specific knowledge and expertise has gained less attention. In this study, we carry out the preliminary and comprehensive case study of utilizing GPT-4V for marine analysis. This report conducts a systematic evaluation of existing GPT-4V, assessing the performance of GPT-4V on marine research and also setting a new standard for future developments in MLLMs. The experimental results of GPT-4V show that the responses generated by GPT-4V are still far away from satisfying the domain-specific requirements of the marine professions. All images and prompts used in this study will be available at https://github.com/hkust-vgd/Marine_GPT-4V_Eval
Paper Structure (25 sections, 36 figures)

This paper contains 25 sections, 36 figures.

Figures (36)

  • Figure 1: The marine object recognition results under three different settings: left column (with random filename); middle column (with meticulously forged misleading filename); and right column (with meaningful and aligned filename). The texts in red represent the wrong responses and texts in green indicate the correct responses. The prompts are "Recognize the object in this figure".
  • Figure 2: The marine object recognition results under the setting with random filenames. The prompts are "Recognize the object in this figure".
  • Figure 3: The marine object recognition results under the setting with meticulously forged misleading filenames. The prompts are "Recognize the object in this figure".
  • Figure 4: The marine object recognition results under the setting with meaningful and aligned filenames. The prompts are "Recognize the object in this figure".
  • Figure 5: The marine object recognition results of recognizing a wide spectrum of marine objects. The prompts are "Recognize this image and tell me the species name of the recognized objects". The ground truths are also provided.
  • ...and 31 more figures