"ScatSpotter" -- A Dog Poop Detection Dataset

Jon Crall

"ScatSpotter" -- A Dog Poop Detection Dataset

Jon Crall

TL;DR

ScatSpotter tackles the practical and technical challenge of detecting small, camouflaged waste objects by introducing a large, open dataset of high-resolution outdoor images annotated with polygons for dog feces. It leverages a before/after/negative BAN protocol to enrich learning signals and benchmarks a diverse set of baselines, including ViT-based segmentation, MaskRCNN, YOLO-v9, and GroundingDINO, with tuned GroundingDINO achieving the strongest box-level performance (AP around $0.69$--$0.70$). The work also systematically compares centralized and decentralized data-distribution mechanisms, quantifying transfer times and illustrating the trade-offs between speed and data integrity. By providing detailed dataset documentation, robust baselines, and reproducible distribution experiments, ScatSpotter aims to advance small-object waste detection for urban cleanliness, environmental monitoring, and downstream ecological tasks, while promoting transparent, open-science practices.

Abstract

Small, amorphous waste objects such as biological droppings and microtrash can be difficult to see, especially in cluttered scenes, yet they matter for environmental cleanliness, public health, and autonomous cleanup. We introduce "ScatSpotter": a new dataset of images annotated with polygons around dog feces, collected to train and study object detection and segmentation systems for small potentially camouflaged outdoor waste. We gathered data in mostly urban environments, using "before/after/negative" (BAN) protocol: for a given location, we capture an image with the object present, an image from the same viewpoint after removal, and a nearby negative scene that often contains visually similar confusers. Image collection began in 2020. This paper focuses on two dataset checkpoints from 2025 and 2024. The dataset contains over 9000 images and 6000 polygon annotations. Of the author-captured images we held out 691 for validation and used the rest to train. Via community participation we obtained a 121-image test set that, while small, is independent from author-collected images and provides some generalization confidence across photographers, devices, and locations. Due to its limited size, we report both validation and test results. We explore the difficulty of the dataset using off-the-shelf VIT, MaskRCNN, YOLO-v9, and DINO-v2 models. Zero-shot DINO performs poorly, indicating limited foundational-model coverage of this category. Tuned DINO is the best model with a box-level average precision of 0.69 on a 691-image validation set and 0.7 on the test set. These results establish strong baselines and quantify the remaining difficulty of detecting small, camouflaged waste objects. To support open access to models and data, we compare centralized and decentralized distribution mechanisms and discuss trade-offs for sharing scientific data. Code and project details are hosted on GitHub.

"ScatSpotter" -- A Dog Poop Detection Dataset

TL;DR

Abstract

"ScatSpotter" -- A Dog Poop Detection Dataset

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (18)