Table of Contents
Fetching ...

SparseUWSeg: Active Sparse Point-Label Augmentation for Underwater Semantic Segmentation

César Borja, Carlos Plou, Rubén Martinez-Cantín, Ana C. Murillo

TL;DR

SparseUWSeg tackles the challenge of generating dense underwater semantic segmentations from scarce expert annotations by coupling an active point-selection strategy with a hybrid label augmentation pipeline that merges SAM2-based masks and PLAS superpixel propagation. The method defines an acquisition function that balances proximity to object centroids and coverage, and propagates seeds through a two-stage augmentation to yield dense masks with full-image coverage. Across UCSD Mosaics and SUIM, SparseUWSeg outperforms state-of-the-art sparse-label augmentation baselines, achieving consistent gains in masked and unmasked metrics, particularly at smaller budgets and with active sampling. The work also releases an interactive annotation tool to help ecology researchers efficiently generate high-quality segmentation masks, bridging foundation-model capabilities with domain-specific marine imagery analysis.

Abstract

Semantic segmentation is essential to automate underwater imagery analysis with ecology monitoring purposes. Unfortunately, fine grained underwater scene analysis is still an open problem even for top performing segmentation models. The high cost of obtaining dense, expert-annotated, segmentation labels hinders the supervision of models in this domain. While sparse point-labels are easier to obtain, they introduce challenges regarding which points to annotate and how to propagate the sparse information. We present SparseUWSeg, a novel framework that addresses both issues. SparseUWSeg employs an active sampling strategy to guide annotators, maximizing the value of their point labels. Then, it propagates these sparse labels with a hybrid approach leverages both the best of SAM2 and superpixel-based methods. Experiments on two diverse underwater datasets demonstrate the benefits of SparseUWSeg over state-of-the-art approaches, achieving up to +5\% mIoU over D+NN. Our main contribution is the design and release of a simple but effective interactive annotation tool, integrating our algorithms. It enables ecology researchers to leverage foundation models and computer vision to efficiently generate high-quality segmentation masks to process their data.

SparseUWSeg: Active Sparse Point-Label Augmentation for Underwater Semantic Segmentation

TL;DR

SparseUWSeg tackles the challenge of generating dense underwater semantic segmentations from scarce expert annotations by coupling an active point-selection strategy with a hybrid label augmentation pipeline that merges SAM2-based masks and PLAS superpixel propagation. The method defines an acquisition function that balances proximity to object centroids and coverage, and propagates seeds through a two-stage augmentation to yield dense masks with full-image coverage. Across UCSD Mosaics and SUIM, SparseUWSeg outperforms state-of-the-art sparse-label augmentation baselines, achieving consistent gains in masked and unmasked metrics, particularly at smaller budgets and with active sampling. The work also releases an interactive annotation tool to help ecology researchers efficiently generate high-quality segmentation masks, bridging foundation-model capabilities with domain-specific marine imagery analysis.

Abstract

Semantic segmentation is essential to automate underwater imagery analysis with ecology monitoring purposes. Unfortunately, fine grained underwater scene analysis is still an open problem even for top performing segmentation models. The high cost of obtaining dense, expert-annotated, segmentation labels hinders the supervision of models in this domain. While sparse point-labels are easier to obtain, they introduce challenges regarding which points to annotate and how to propagate the sparse information. We present SparseUWSeg, a novel framework that addresses both issues. SparseUWSeg employs an active sampling strategy to guide annotators, maximizing the value of their point labels. Then, it propagates these sparse labels with a hybrid approach leverages both the best of SAM2 and superpixel-based methods. Experiments on two diverse underwater datasets demonstrate the benefits of SparseUWSeg over state-of-the-art approaches, achieving up to +5\% mIoU over D+NN. Our main contribution is the design and release of a simple but effective interactive annotation tool, integrating our algorithms. It enables ecology researchers to leverage foundation models and computer vision to efficiently generate high-quality segmentation masks to process their data.

Paper Structure

This paper contains 32 sections, 8 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Sparse point-label augmentation of 30 labeled points (yellow circles shown in first two columns) with different approaches. Propagation of labels assigned to randomly sampled points (top row) vs. points sampled by our dynamic strategy (bottom row). (a) input image with sparse point-labels (PL) superposed, (b) point-labels over background for better visualization, (c) Superpixel-based PLAS raine2022point covers broader regions but often misses fine details and object boundaries, (d) SAM2 ravi2024sam expansion produces sharp masks on confident objects but leaves large areas unlabeled, (e) SparseUWSeg (ours) combines both strengths via active point selection and hybrid augmentation, and (f) ground truth segmentation for reference.
  • Figure 2: Intersection over Union (IoU) vs masked-IoU. (a) ground truth with a small object surrounded by background. (b) predicted mask that oversegments into background, resulting in a low IoU (35.60%). (c) visual representation of the same predicted mask when background regions are masked out, illustrating what masked-IoU effectively measures. This example illustrates how masked metrics focus on foreground accuracy without penalizing background oversegmentation.
  • Figure 3: Qualitative results. Label augmentation using PLAS, SAM and our SparseUWSeg on UCSD Mosaics (left block) and SUIM (right block). Each pair of rows shows the final segmentation of the same image, with random sampling on top and our DynamicPoints sampling on bottom. In each block, from left to right: (a) input image with sparse point-labels superposed, (b) PLAS covers broad areas but with coarse region boundaries, (c) SAM2 produces more precise masks but leaves large areas unlabeled, (d) SparseUWSeg (Ours) combines both strengths to achieve a better overall segmentation, (e) ground truth (GT) segmentation. Our DynamicPoints sampling leads to a better point placement that supports effective label augmentation. Background color.