FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting

Yi Liu; Jingyu Song; Vedanth Kallakuri; Katherine A. Skinner

FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting

Yi Liu, Jingyu Song, Vedanth Kallakuri, Katherine A. Skinner

TL;DR

<3-5 sentence high-level summary> FishDetector-R1 presents a unified, weakly supervised framework for underwater fish detection, segmentation, and counting by uniting an MLLM with a segmentation foundation model through a detect-to-count prompt and reinforcement fine-tuning (RLVR) via GRPO. The approach enforces spatial and numerical consistency between localization and counting, providing strong pixel-wise segmentation with sparse annotations and demonstrating robust cross-domain generalization to SUIM. Key contributions include the novel detect-to-count prompting, the RLVR objective, and extensive ablations showing complementary reward signals. Empirical results on DeepFish show competitive or superior performance to fully supervised baselines in some settings, with zero-shot transfer validated on SUIM, indicating practical impact for scalable ecological monitoring and marine habitat assessment.

Abstract

Analyzing underwater fish imagery is critical for ecological monitoring but remains difficult due to visual degradation and costly annotations. We introduce FishDetector-R1, a unified MLLM-based framework for fish detection, segmentation, and counting under weak supervision. On the DeepFish dataset, our framework achieves substantial gains over baselines, improving AP by 20% and mIoU by 10%, while reducing MAE by 30% and GAME by 35%. These improvements stem from two key components: a novel detect-to-count prompt that enforces spatially consistent detections and counts, and Reinforcement Learning from Verifiable Reward (RLVR) with a complementary scalable paradigm leveraging sparse point labels. Ablation studies further validate the effectiveness of this reward design. Moreover, the improvement generalizes well to other underwater datasets, confirming strong cross-domain robustness. Overall, FishDetector-R1 provides a reliable and scalable solution for accurate marine visual understanding via weak supervision. The project page for FishDetector-R1 is https://umfieldrobotics.github.io/FishDetector-R1.

FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting

TL;DR

Abstract

FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)