The Garbage Dataset (GD): A Multi-Class Image Benchmark for Automated Waste Segregation

Suman Kunwar

The Garbage Dataset (GD): A Multi-Class Image Benchmark for Automated Waste Segregation

Suman Kunwar

TL;DR

The Garbage Dataset (GD) tackles the need for robust, large-scale waste image data to advance automated segregation. It collects 13,348 images across 10 household waste categories from multi-source gathering, with rigorous quality checks and standardized resolutions to enable fair benchmarks. Benchmarking across MobileNet, ResNet, and EfficientNetV2 variants shows EfficientNetV2S achieving the top performance at $96.19\%$ accuracy and $0.96$ F1, while highlighting the environmental cost of training and the limited gains from higher input resolution. The analysis emphasizes data-centric challenges such as class imbalance and background complexity, positioning GD as a practical, public benchmark that informs both model design and sustainable deployment for waste-management applications.

Abstract

This study introduces the Garbage Dataset (GD), a publicly available image dataset designed to advance automated waste segregation through machine learning and computer vision. It's a diverse dataset covering 10 common household waste categories: metal, glass, biological, paper, battery, trash, cardboard, shoes, clothes, and plastic. The dataset comprises 13,348 labeled images collected through multiple methods, including DWaste mobile app and curated web sources. Methods included rigorous validation through checksums and outlier detection, analysis of class imbalance and visual separability via PCA/t-SNE, and assessment of background complexity using entropy and saliency measures. The dataset was benchmarked using state-of-the-art deep learning models (EfficientNetV2M, EfficientNetV2S, MobileNet, ResNet50, ResNet101) evaluated on performance metrics and operational carbon emissions. Experiment results indicate EfficientNetV2S achieved the highest performance with 96.19% accuracy and a 0.96 F1-score, though with a moderate carbon cost. Analysis revealed inherent dataset characteristics including class imbalance, a skew toward high-outlier classes (plastic, cardboard, paper), and brightness variations that require consideration. The main conclusion is that GD provides a valuable, real-world benchmark for waste classification research while highlighting important challenges such as class imbalance, background complexity, and environmental trade-offs in model selection that must be addressed for practical deployment. The dataset is publicly released to support further research in environmental sustainability applications.

The Garbage Dataset (GD): A Multi-Class Image Benchmark for Automated Waste Segregation

TL;DR

accuracy and

F1, while highlighting the environmental cost of training and the limited gains from higher input resolution. The analysis emphasizes data-centric challenges such as class imbalance and background complexity, positioning GD as a practical, public benchmark that informs both model design and sustainable deployment for waste-management applications.

Abstract

Paper Structure (12 sections, 10 figures, 1 table)

This paper contains 12 sections, 10 figures, 1 table.

Introduction
The Garbage Dataset (GD)
Data Collection and Curation
Dataset Statistics and Structure
Dataset Quality and Validation
Dataset Analysis
Background and Foreground Analysis
Class Imbalance and Visual Separability
Benchmark Experiments
Experimental Setup
Results and Discussion
Conclusion

Figures (10)

Figure 1: Example images from the dataset, illustrating variation in object type, scene context, and image quality
Figure 2: Interface and workflow of the DWaste mobile application for field data collection
Figure 3: Class distribution of the dataset
Figure 4: Distribution of image resolutions
Figure 5: Dataset summary
...and 5 more figures

The Garbage Dataset (GD): A Multi-Class Image Benchmark for Automated Waste Segregation

TL;DR

Abstract

The Garbage Dataset (GD): A Multi-Class Image Benchmark for Automated Waste Segregation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)