Table of Contents
Fetching ...

A geometric and deep learning reproducible pipeline for monitoring floating anthropogenic debris in urban rivers using in situ cameras

Gauthier Grimmer, Romain Wenger, Clément Flint, Germain Forestier, Gilles Rixhon, Valentin Chardon

TL;DR

This paper tackles the challenge of monitoring floating anthropogenic debris in urban rivers using fixed in situ cameras by integrating a reproducible, end-to-end pipeline that combines deep learning debris detection with a geometry-based size estimation module. It systematically compares multiple YOLO architectures under realistic, variable environmental conditions, introduces negative-image datasets and leakage-aware data partitioning, and demonstrates that YOLOv11-m (and YOLOv8-n in some setups) offer strong performance under constraints. The second major contribution is a projective-geometry framework that uses known camera intrinsics and extrinsics to estimate real-world object dimensions from monocular detections, augmented by regression-based corrections to achieve centimeter-level accuracy approaching the image-resolution limit. Together, the work provides a robust, low-cost pathway toward scalable, reproducible monitoring and, potentially, mass estimation of debris across catchments, with clear considerations of biases, data integrity, and practical deployment challenges.

Abstract

The proliferation of floating anthropogenic debris in rivers has emerged as a pressing environmental concern, exerting a detrimental influence on biodiversity, water quality, and human activities such as navigation and recreation. The present study proposes a novel methodological framework for the monitoring the aforementioned waste, utilising fixed, in-situ cameras. This study provides two key contributions: (i) the continuous quantification and monitoring of floating debris using deep learning and (ii) the identification of the most suitable deep learning model in terms of accuracy and inference speed under complex environmental conditions. These models are tested in a range of environmental conditions and learning configurations, including experiments on biases related to data leakage. Furthermore, a geometric model is implemented to estimate the actual size of detected objects from a 2D image. This model takes advantage of both intrinsic and extrinsic characteristics of the camera. The findings of this study underscore the significance of the dataset constitution protocol, particularly with respect to the integration of negative images and the consideration of temporal leakage. In conclusion, the feasibility of metric object estimation using projective geometry coupled with regression corrections is demonstrated. This approach paves the way for the development of robust, low-cost, automated monitoring systems for urban aquatic environments.

A geometric and deep learning reproducible pipeline for monitoring floating anthropogenic debris in urban rivers using in situ cameras

TL;DR

This paper tackles the challenge of monitoring floating anthropogenic debris in urban rivers using fixed in situ cameras by integrating a reproducible, end-to-end pipeline that combines deep learning debris detection with a geometry-based size estimation module. It systematically compares multiple YOLO architectures under realistic, variable environmental conditions, introduces negative-image datasets and leakage-aware data partitioning, and demonstrates that YOLOv11-m (and YOLOv8-n in some setups) offer strong performance under constraints. The second major contribution is a projective-geometry framework that uses known camera intrinsics and extrinsics to estimate real-world object dimensions from monocular detections, augmented by regression-based corrections to achieve centimeter-level accuracy approaching the image-resolution limit. Together, the work provides a robust, low-cost pathway toward scalable, reproducible monitoring and, potentially, mass estimation of debris across catchments, with clear considerations of biases, data integrity, and practical deployment challenges.

Abstract

The proliferation of floating anthropogenic debris in rivers has emerged as a pressing environmental concern, exerting a detrimental influence on biodiversity, water quality, and human activities such as navigation and recreation. The present study proposes a novel methodological framework for the monitoring the aforementioned waste, utilising fixed, in-situ cameras. This study provides two key contributions: (i) the continuous quantification and monitoring of floating debris using deep learning and (ii) the identification of the most suitable deep learning model in terms of accuracy and inference speed under complex environmental conditions. These models are tested in a range of environmental conditions and learning configurations, including experiments on biases related to data leakage. Furthermore, a geometric model is implemented to estimate the actual size of detected objects from a 2D image. This model takes advantage of both intrinsic and extrinsic characteristics of the camera. The findings of this study underscore the significance of the dataset constitution protocol, particularly with respect to the integration of negative images and the consideration of temporal leakage. In conclusion, the feasibility of metric object estimation using projective geometry coupled with regression corrections is demonstrated. This approach paves the way for the development of robust, low-cost, automated monitoring systems for urban aquatic environments.

Paper Structure

This paper contains 53 sections, 27 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Employed methodology : (1) Acquired data is manually annotated to constitute initial dataset. (2) The latter is pre-processed so that (3) several models can then be trained to detect debris in rivers. (4) Dimensions of detected objects are predicted and then (5) corrected to reduce errors
  • Figure 2: Location of Steingiessen River and inlet and outlet cameras (north of Greater Strasbourg)
  • Figure 3: Data acquired on March 4, 2025 upstream (left) and downstream (right) of the Steingiessen River. Bounding boxes on anthropogenic debris and non-debris materials. Bounding boxes have been manually digitized.
  • Figure 4: Evolution of Daily Insolation Duration (INST), Daily Global Radiation (GLOT) and fraction of sunshine in relation to day length (SIGMA) during February 2025 at the Strasbourg - Entzheim weather station (France) (“Données climatologiques de base - quotidiennes”, Météo-France, 2025). In grey, cloudy weather and in pink, sunny weather.
  • Figure 5: General architecture of a YOLO object detection model. The structure is divided into three main components: the backbone, responsible for extracting hierarchical features from the input image using a series of convolutional and pooling layers; the neck, which enhances feature aggregation across different scales (often using modules such as PANet or FPN in recent versions); and the head, which performs final object detection by predicting bounding boxes, objectness scores, and class probabilities. While the figure reflects a simplified backbone resembling early YOLO versions, the general structure remains consistent across modern versions such as YOLOv5, YOLOv8, and YOLOv11, with architectural refinements aimed at improving speed and accuracy.
  • ...and 11 more figures