RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety

Andrei Dumitriu; Florin Tatui; Florin Miron; Aakash Ralhan; Radu Tudor Ionescu; Radu Timofte

RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety

Andrei Dumitriu, Florin Tatui, Florin Miron, Aakash Ralhan, Radu Tudor Ionescu, Radu Timofte

TL;DR

RipVIS tackles the challenge of rip current detection by providing the first large-scale video instance segmentation benchmark for rip currents, featuring 184 videos (212,328 frames) with 150 rip-current videos (163,528 frames) and 34 non-rip videos, collected from diverse sources and annotated at 5 FPS. The authors benchmark multiple detectors (two-stage and one-stage) and introduce Temporal Confidence Aggregation (TCA), a pixel-level temporal post-processing method that aggregates frame-level predictions into stabilized temporal heatmaps to reduce false negatives and improve boundary consistency. Key contributions include a comprehensive dataset with a train/validation/test split, baseline model analyses, and the TCA technique, all released through the RipVIS website to foster ongoing community collaboration. The work advances beach safety through more reliable rip current segmentation and provides a practical platform for developing robust, temporally aware detection systems in diverse coastal environments.

Abstract

Rip currents are strong, localized and narrow currents of water that flow outwards into the sea, causing numerous beach-related injuries and fatalities worldwide. Accurate identification of rip currents remains challenging due to their amorphous nature and the lack of annotated data, which often requires expert knowledge. To address these issues, we present RipVIS, a large-scale video instance segmentation benchmark explicitly designed for rip current segmentation. RipVIS is an order of magnitude larger than previous datasets, featuring $184$ videos ($212,328$ frames), of which $150$ videos ($163,528$ frames) are with rip currents, collected from various sources, including drones, mobile phones, and fixed beach cameras. Our dataset encompasses diverse visual contexts, such as wave-breaking patterns, sediment flows, and water color variations, across multiple global locations, including USA, Mexico, Costa Rica, Portugal, Italy, Greece, Romania, Sri Lanka, Australia and New Zealand. Most videos are annotated at $5$ FPS to ensure accuracy in dynamic scenarios, supplemented by an additional $34$ videos ($48,800$ frames) without rip currents. We conduct comprehensive experiments with Mask R-CNN, Cascade Mask R-CNN, SparseInst and YOLO11, fine-tuning these models for the task of rip current segmentation. Results are reported in terms of multiple metrics, with a particular focus on the $F_2$ score to prioritize recall and reduce false negatives. To enhance segmentation performance, we introduce a novel post-processing step based on Temporal Confidence Aggregation (TCA). RipVIS aims to set a new standard for rip current segmentation, contributing towards safer beach environments. We offer a benchmark website to share data, models, and results with the research community, encouraging ongoing collaboration and future contributions, at https://ripvis.ai.

RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety

TL;DR

Abstract

videos (

frames), of which

videos (

frames) are with rip currents, collected from various sources, including drones, mobile phones, and fixed beach cameras. Our dataset encompasses diverse visual contexts, such as wave-breaking patterns, sediment flows, and water color variations, across multiple global locations, including USA, Mexico, Costa Rica, Portugal, Italy, Greece, Romania, Sri Lanka, Australia and New Zealand. Most videos are annotated at

FPS to ensure accuracy in dynamic scenarios, supplemented by an additional

videos (

frames) without rip currents. We conduct comprehensive experiments with Mask R-CNN, Cascade Mask R-CNN, SparseInst and YOLO11, fine-tuning these models for the task of rip current segmentation. Results are reported in terms of multiple metrics, with a particular focus on the

score to prioritize recall and reduce false negatives. To enhance segmentation performance, we introduce a novel post-processing step based on Temporal Confidence Aggregation (TCA). RipVIS aims to set a new standard for rip current segmentation, contributing towards safer beach environments. We offer a benchmark website to share data, models, and results with the research community, encouraging ongoing collaboration and future contributions, at https://ripvis.ai.

RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety

TL;DR

Abstract

RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)