Table of Contents
Fetching ...

StreetView-Waste: A Multi-Task Dataset for Urban Waste Management

Diogo J. Paulo, João Martins, Hugo Proença, João C. Neves

TL;DR

StreetView-Waste introduces a large-scale, street-level fisheye multi-task dataset for urban waste management, enabling detection, tracking, and overflow segmentation from vehicle-mounted cameras. It couples baseline benchmarks with two model-agnostic strategies— a heuristic-based tracking refinement and a geometry-aware segmentation framework that fuses RGB with depth and surface normals—to probe real-world perception challenges. Results show strong detection performance for frame-based detectors, targeted improvements in counting with simple tracking heuristics, and mixed gains from geometric priors depending on model architecture, underscoring open challenges in keeping identities and segmenting amorphous litter. The dataset and insights aim to spur robust, logistics-ready perception systems for smart waste management and to motivate future extensions such as GPS-enabled routing and more sophisticated multimodal fusion approaches.

Abstract

Urban waste management remains a critical challenge for the development of smart cities. Despite the growing number of litter detection datasets, the problem of monitoring overflowing waste containers, particularly from images captured by garbage trucks, has received little attention. While existing datasets are valuable, they often lack annotations for specific container tracking or are captured in static, decontextualized environments, limiting their utility for real-world logistics. To address this gap, we present StreetView-Waste, a comprehensive dataset of urban scenes featuring litter and waste containers. The dataset supports three key evaluation tasks: (1) waste container detection, (2) waste container tracking, and (3) waste overflow segmentation. Alongside the dataset, we provide baselines for each task by benchmarking state-of-the-art models in object detection, tracking, and segmentation. Additionally, we enhance baseline performance by proposing two complementary strategies: a heuristic-based method for improved waste container tracking and a model-agnostic framework that leverages geometric priors to refine litter segmentation. Our experimental results show that while fine-tuned object detectors achieve reasonable performance in detecting waste containers, baseline tracking methods struggle to accurately estimate their number; however, our proposed heuristics reduce the mean absolute counting error by 79.6%. Similarly, while segmenting amorphous litter is challenging, our geometry-aware strategy improves segmentation mAP@0.5 by 27% on lightweight models, demonstrating the value of multimodal inputs for this task. Ultimately, StreetView-Waste provides a challenging benchmark to encourage research into real-world perception systems for urban waste management.

StreetView-Waste: A Multi-Task Dataset for Urban Waste Management

TL;DR

StreetView-Waste introduces a large-scale, street-level fisheye multi-task dataset for urban waste management, enabling detection, tracking, and overflow segmentation from vehicle-mounted cameras. It couples baseline benchmarks with two model-agnostic strategies— a heuristic-based tracking refinement and a geometry-aware segmentation framework that fuses RGB with depth and surface normals—to probe real-world perception challenges. Results show strong detection performance for frame-based detectors, targeted improvements in counting with simple tracking heuristics, and mixed gains from geometric priors depending on model architecture, underscoring open challenges in keeping identities and segmenting amorphous litter. The dataset and insights aim to spur robust, logistics-ready perception systems for smart waste management and to motivate future extensions such as GPS-enabled routing and more sophisticated multimodal fusion approaches.

Abstract

Urban waste management remains a critical challenge for the development of smart cities. Despite the growing number of litter detection datasets, the problem of monitoring overflowing waste containers, particularly from images captured by garbage trucks, has received little attention. While existing datasets are valuable, they often lack annotations for specific container tracking or are captured in static, decontextualized environments, limiting their utility for real-world logistics. To address this gap, we present StreetView-Waste, a comprehensive dataset of urban scenes featuring litter and waste containers. The dataset supports three key evaluation tasks: (1) waste container detection, (2) waste container tracking, and (3) waste overflow segmentation. Alongside the dataset, we provide baselines for each task by benchmarking state-of-the-art models in object detection, tracking, and segmentation. Additionally, we enhance baseline performance by proposing two complementary strategies: a heuristic-based method for improved waste container tracking and a model-agnostic framework that leverages geometric priors to refine litter segmentation. Our experimental results show that while fine-tuned object detectors achieve reasonable performance in detecting waste containers, baseline tracking methods struggle to accurately estimate their number; however, our proposed heuristics reduce the mean absolute counting error by 79.6%. Similarly, while segmenting amorphous litter is challenging, our geometry-aware strategy improves segmentation mAP@0.5 by 27% on lightweight models, demonstrating the value of multimodal inputs for this task. Ultimately, StreetView-Waste provides a challenging benchmark to encourage research into real-world perception systems for urban waste management.

Paper Structure

This paper contains 17 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: We introduce StreetView-Waste, the first fisheye image dataset tailored for urban waste analysis. Captured using two 180$^{\circ}$ field of view cameras, the dataset mirrors the settings of real urban waste collection, providing high-quality annotations for three core tasks: 2D object detection, object tracking, and instance segmentation. These tasks are critical for logistics, with detection and overflow segmentation enabling status assessment, while tracking is essential for mapping municipal assets and optimizing collection routes. StreetView-Waste serves as a foundation for developing robust, real-world waste analysis models.
  • Figure 2: Class distribution statistics for our StreetView-Waste dataset. (a) Distribution of the 376 unique container tracks, highlighting the long-tail nature of the tracking task. (b) Distribution of the 71,170 total annotated instances for the detection task.
  • Figure 3: Overview of the geometry-aware method for the segmentation task. The input RGB image $I$ is processed using a geometry estimation module, which produces both a depth map $D$ and a surface normal map $N$. These are then concatenated with the original image to form an enriched input tensor $X_{\text{geo}} \in \mathbb{R}^{H \times W \times 7}$. This new representation is then fed to adapted segmentation models capable of handling multi-channel input, which output the predicted mask for overflowing waste.
  • Figure 4: Qualitative results for the multi-object tracking. This scenario, common in our dataset, shows the difficulty of keeping track and explains the results of the lower IDF1 score when introducing temporal heuristics. This improves track continuity but corrupts identity (lowering IDF1).
  • Figure 5: Qualitative results for the waste overflow segmentation task, comparing the original images with our proposed geometry-aware strategy. The columns show, from left to right: the original input image, the same image with the segmentation result from our method, the estimated surface normal map, and the estimated depth map.