Table of Contents
Fetching ...

"ScatSpotter" -- A Dog Poop Detection Dataset

Jon Crall

TL;DR

ScatSpotter tackles the practical and technical challenge of detecting small, camouflaged waste objects by introducing a large, open dataset of high-resolution outdoor images annotated with polygons for dog feces. It leverages a before/after/negative BAN protocol to enrich learning signals and benchmarks a diverse set of baselines, including ViT-based segmentation, MaskRCNN, YOLO-v9, and GroundingDINO, with tuned GroundingDINO achieving the strongest box-level performance (AP around $0.69$--$0.70$). The work also systematically compares centralized and decentralized data-distribution mechanisms, quantifying transfer times and illustrating the trade-offs between speed and data integrity. By providing detailed dataset documentation, robust baselines, and reproducible distribution experiments, ScatSpotter aims to advance small-object waste detection for urban cleanliness, environmental monitoring, and downstream ecological tasks, while promoting transparent, open-science practices.

Abstract

Small, amorphous waste objects such as biological droppings and microtrash can be difficult to see, especially in cluttered scenes, yet they matter for environmental cleanliness, public health, and autonomous cleanup. We introduce "ScatSpotter": a new dataset of images annotated with polygons around dog feces, collected to train and study object detection and segmentation systems for small potentially camouflaged outdoor waste. We gathered data in mostly urban environments, using "before/after/negative" (BAN) protocol: for a given location, we capture an image with the object present, an image from the same viewpoint after removal, and a nearby negative scene that often contains visually similar confusers. Image collection began in 2020. This paper focuses on two dataset checkpoints from 2025 and 2024. The dataset contains over 9000 images and 6000 polygon annotations. Of the author-captured images we held out 691 for validation and used the rest to train. Via community participation we obtained a 121-image test set that, while small, is independent from author-collected images and provides some generalization confidence across photographers, devices, and locations. Due to its limited size, we report both validation and test results. We explore the difficulty of the dataset using off-the-shelf VIT, MaskRCNN, YOLO-v9, and DINO-v2 models. Zero-shot DINO performs poorly, indicating limited foundational-model coverage of this category. Tuned DINO is the best model with a box-level average precision of 0.69 on a 691-image validation set and 0.7 on the test set. These results establish strong baselines and quantify the remaining difficulty of detecting small, camouflaged waste objects. To support open access to models and data, we compare centralized and decentralized distribution mechanisms and discuss trade-offs for sharing scientific data. Code and project details are hosted on GitHub.

"ScatSpotter" -- A Dog Poop Detection Dataset

TL;DR

ScatSpotter tackles the practical and technical challenge of detecting small, camouflaged waste objects by introducing a large, open dataset of high-resolution outdoor images annotated with polygons for dog feces. It leverages a before/after/negative BAN protocol to enrich learning signals and benchmarks a diverse set of baselines, including ViT-based segmentation, MaskRCNN, YOLO-v9, and GroundingDINO, with tuned GroundingDINO achieving the strongest box-level performance (AP around --). The work also systematically compares centralized and decentralized data-distribution mechanisms, quantifying transfer times and illustrating the trade-offs between speed and data integrity. By providing detailed dataset documentation, robust baselines, and reproducible distribution experiments, ScatSpotter aims to advance small-object waste detection for urban cleanliness, environmental monitoring, and downstream ecological tasks, while promoting transparent, open-science practices.

Abstract

Small, amorphous waste objects such as biological droppings and microtrash can be difficult to see, especially in cluttered scenes, yet they matter for environmental cleanliness, public health, and autonomous cleanup. We introduce "ScatSpotter": a new dataset of images annotated with polygons around dog feces, collected to train and study object detection and segmentation systems for small potentially camouflaged outdoor waste. We gathered data in mostly urban environments, using "before/after/negative" (BAN) protocol: for a given location, we capture an image with the object present, an image from the same viewpoint after removal, and a nearby negative scene that often contains visually similar confusers. Image collection began in 2020. This paper focuses on two dataset checkpoints from 2025 and 2024. The dataset contains over 9000 images and 6000 polygon annotations. Of the author-captured images we held out 691 for validation and used the rest to train. Via community participation we obtained a 121-image test set that, while small, is independent from author-collected images and provides some generalization confidence across photographers, devices, and locations. Due to its limited size, we report both validation and test results. We explore the difficulty of the dataset using off-the-shelf VIT, MaskRCNN, YOLO-v9, and DINO-v2 models. Zero-shot DINO performs poorly, indicating limited foundational-model coverage of this category. Tuned DINO is the best model with a box-level average precision of 0.69 on a 691-image validation set and 0.7 on the test set. These results establish strong baselines and quantify the remaining difficulty of detecting small, camouflaged waste objects. To support open access to models and data, we compare centralized and decentralized distribution mechanisms and discuss trade-offs for sharing scientific data. Code and project details are hosted on GitHub.

Paper Structure

This paper contains 30 sections, 18 figures, 7 tables.

Figures (18)

  • Figure 1: (a) A challenging annotation case due to clutter and camouflage. (b) An image triplet from the BAN protocol.
  • Figure 2: A comparison of all of the annotations for different datasets including ours. All polygon annotations drawn in a single plot with $0.8$ opacity to demonstrate the distribution in annotation location, shape, and size with respect to image coordinates.
  • Figure 3: Example images from 2D UMAP clusters mcinnes_umap_2020. Each point in the top image represents a 2D-projected embedding, with numbered orange dots indicating nearby images in the bottom columns. Blue annotation boxes are shown. A clear separation emerges between snowy (columns 1-2) and non-snowy images (columns 3-13).
  • Figure 4: Dataset distributions. (a) Time and daylight scatterplot. (b) Annotation count histogram.
  • Figure 5: Qualitative results from validation-selected models applied to the same validation images. Subfigures (a-c) show results for VIT and MaskRCNN, including both the binarized classification map (true positives in green, false positives in red, false negatives in purple, true negatives in black) and the predicted heatmap before binarization. Subfigures (d-g) show bounding-box detections from YOLO-v9 and Grounding DINO, using the same color scheme (blue = true-positive predicted boxes; green = matched ground truth). Subfigure (h) shows the input image.
  • ...and 13 more figures