Table of Contents
Fetching ...

RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety

Andrei Dumitriu, Florin Tatui, Florin Miron, Aakash Ralhan, Radu Tudor Ionescu, Radu Timofte

TL;DR

RipVIS tackles the challenge of rip current detection by providing the first large-scale video instance segmentation benchmark for rip currents, featuring 184 videos (212,328 frames) with 150 rip-current videos (163,528 frames) and 34 non-rip videos, collected from diverse sources and annotated at 5 FPS. The authors benchmark multiple detectors (two-stage and one-stage) and introduce Temporal Confidence Aggregation (TCA), a pixel-level temporal post-processing method that aggregates frame-level predictions into stabilized temporal heatmaps to reduce false negatives and improve boundary consistency. Key contributions include a comprehensive dataset with a train/validation/test split, baseline model analyses, and the TCA technique, all released through the RipVIS website to foster ongoing community collaboration. The work advances beach safety through more reliable rip current segmentation and provides a practical platform for developing robust, temporally aware detection systems in diverse coastal environments.

Abstract

Rip currents are strong, localized and narrow currents of water that flow outwards into the sea, causing numerous beach-related injuries and fatalities worldwide. Accurate identification of rip currents remains challenging due to their amorphous nature and the lack of annotated data, which often requires expert knowledge. To address these issues, we present RipVIS, a large-scale video instance segmentation benchmark explicitly designed for rip current segmentation. RipVIS is an order of magnitude larger than previous datasets, featuring $184$ videos ($212,328$ frames), of which $150$ videos ($163,528$ frames) are with rip currents, collected from various sources, including drones, mobile phones, and fixed beach cameras. Our dataset encompasses diverse visual contexts, such as wave-breaking patterns, sediment flows, and water color variations, across multiple global locations, including USA, Mexico, Costa Rica, Portugal, Italy, Greece, Romania, Sri Lanka, Australia and New Zealand. Most videos are annotated at $5$ FPS to ensure accuracy in dynamic scenarios, supplemented by an additional $34$ videos ($48,800$ frames) without rip currents. We conduct comprehensive experiments with Mask R-CNN, Cascade Mask R-CNN, SparseInst and YOLO11, fine-tuning these models for the task of rip current segmentation. Results are reported in terms of multiple metrics, with a particular focus on the $F_2$ score to prioritize recall and reduce false negatives. To enhance segmentation performance, we introduce a novel post-processing step based on Temporal Confidence Aggregation (TCA). RipVIS aims to set a new standard for rip current segmentation, contributing towards safer beach environments. We offer a benchmark website to share data, models, and results with the research community, encouraging ongoing collaboration and future contributions, at https://ripvis.ai.

RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety

TL;DR

RipVIS tackles the challenge of rip current detection by providing the first large-scale video instance segmentation benchmark for rip currents, featuring 184 videos (212,328 frames) with 150 rip-current videos (163,528 frames) and 34 non-rip videos, collected from diverse sources and annotated at 5 FPS. The authors benchmark multiple detectors (two-stage and one-stage) and introduce Temporal Confidence Aggregation (TCA), a pixel-level temporal post-processing method that aggregates frame-level predictions into stabilized temporal heatmaps to reduce false negatives and improve boundary consistency. Key contributions include a comprehensive dataset with a train/validation/test split, baseline model analyses, and the TCA technique, all released through the RipVIS website to foster ongoing community collaboration. The work advances beach safety through more reliable rip current segmentation and provides a practical platform for developing robust, temporally aware detection systems in diverse coastal environments.

Abstract

Rip currents are strong, localized and narrow currents of water that flow outwards into the sea, causing numerous beach-related injuries and fatalities worldwide. Accurate identification of rip currents remains challenging due to their amorphous nature and the lack of annotated data, which often requires expert knowledge. To address these issues, we present RipVIS, a large-scale video instance segmentation benchmark explicitly designed for rip current segmentation. RipVIS is an order of magnitude larger than previous datasets, featuring videos ( frames), of which videos ( frames) are with rip currents, collected from various sources, including drones, mobile phones, and fixed beach cameras. Our dataset encompasses diverse visual contexts, such as wave-breaking patterns, sediment flows, and water color variations, across multiple global locations, including USA, Mexico, Costa Rica, Portugal, Italy, Greece, Romania, Sri Lanka, Australia and New Zealand. Most videos are annotated at FPS to ensure accuracy in dynamic scenarios, supplemented by an additional videos ( frames) without rip currents. We conduct comprehensive experiments with Mask R-CNN, Cascade Mask R-CNN, SparseInst and YOLO11, fine-tuning these models for the task of rip current segmentation. Results are reported in terms of multiple metrics, with a particular focus on the score to prioritize recall and reduce false negatives. To enhance segmentation performance, we introduce a novel post-processing step based on Temporal Confidence Aggregation (TCA). RipVIS aims to set a new standard for rip current segmentation, contributing towards safer beach environments. We offer a benchmark website to share data, models, and results with the research community, encouraging ongoing collaboration and future contributions, at https://ripvis.ai.

Paper Structure

This paper contains 29 sections, 3 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Examples from our dataset, illustrating the diversity in locations, rip current types, viewpoint elevations and viewing angles. Rip currents are identifiable by distinct wave-breaking patterns, sediment transport, and instances of deflection rip currents. Rip current annotations are shown in red. Additional examples are provided in the supplementary material. Best viewed in color.
  • Figure 2: Map of the countries present in the RipVIS dataset. From left to right: USA, Mexico, Costa Rica, Portugal, Italy, Greece, Romania, Sri Lanka, Australia and New Zealand. Created with mapchart.net.
  • Figure 3: The proposed Temporal Confidence Aggregation (TCA) process, simplified. TCA leverages temporal coherence through downsampling, instance tracking, temporal smoothing, and hysteresis thresholding to create a stabilized temporal heatmap. Best viewed in color.
  • Figure 4: Examples of rip current detection results across processing stages, with each row illustrating a distinct case for the impact of TCA: 1. TCA smooths the rip current shape on a successful detection. 2. TCA recovers false negatives on the right side. 3. TCA reduces false positives of an over-segmented mask to better match the ground truth. 4. TCA enables detection across frames with consecutive false negatives. 5. Failure case: TCA reduces detection accuracy due to initial stationary detection followed by rapid camera movement.
  • Figure 5: A more detailed example of TCA in action. All rows are of frames from the same video, showing how we mitigate for the false negative present in frames 176 (3rd row) and frame 202 (5th row).
  • ...and 4 more figures