A geometric and deep learning reproducible pipeline for monitoring floating anthropogenic debris in urban rivers using in situ cameras
Gauthier Grimmer, Romain Wenger, Clément Flint, Germain Forestier, Gilles Rixhon, Valentin Chardon
TL;DR
This paper tackles the challenge of monitoring floating anthropogenic debris in urban rivers using fixed in situ cameras by integrating a reproducible, end-to-end pipeline that combines deep learning debris detection with a geometry-based size estimation module. It systematically compares multiple YOLO architectures under realistic, variable environmental conditions, introduces negative-image datasets and leakage-aware data partitioning, and demonstrates that YOLOv11-m (and YOLOv8-n in some setups) offer strong performance under constraints. The second major contribution is a projective-geometry framework that uses known camera intrinsics and extrinsics to estimate real-world object dimensions from monocular detections, augmented by regression-based corrections to achieve centimeter-level accuracy approaching the image-resolution limit. Together, the work provides a robust, low-cost pathway toward scalable, reproducible monitoring and, potentially, mass estimation of debris across catchments, with clear considerations of biases, data integrity, and practical deployment challenges.
Abstract
The proliferation of floating anthropogenic debris in rivers has emerged as a pressing environmental concern, exerting a detrimental influence on biodiversity, water quality, and human activities such as navigation and recreation. The present study proposes a novel methodological framework for the monitoring the aforementioned waste, utilising fixed, in-situ cameras. This study provides two key contributions: (i) the continuous quantification and monitoring of floating debris using deep learning and (ii) the identification of the most suitable deep learning model in terms of accuracy and inference speed under complex environmental conditions. These models are tested in a range of environmental conditions and learning configurations, including experiments on biases related to data leakage. Furthermore, a geometric model is implemented to estimate the actual size of detected objects from a 2D image. This model takes advantage of both intrinsic and extrinsic characteristics of the camera. The findings of this study underscore the significance of the dataset constitution protocol, particularly with respect to the integration of negative images and the consideration of temporal leakage. In conclusion, the feasibility of metric object estimation using projective geometry coupled with regression corrections is demonstrated. This approach paves the way for the development of robust, low-cost, automated monitoring systems for urban aquatic environments.
