Table of Contents
Fetching ...

Garbage Segmentation and Attribute Analysis by Robotic Dogs

Nuo Xu, Jianfeng Liao, Qiwei Meng, Wei Song

TL;DR

This work introduces GSA2Seg, a cloud-based, vision–language–driven framework that performs garbage segmentation and attribute analysis to inform robotic grasping in diverse environments. By fusing visual features with language prompts via bi-directional attention in an encoder–decoder, GSA2Seg predicts precise masks, bounding boxes, and object attributes, enabling state-aware manipulation. The authors contribute the GSA2D dataset, comprising 3119 images across 10 garbage types with attributes such as placement state (standing/lying/deformation) and position (ground/platform), to benchmark open-vocabulary attribute-aware segmentation. Experimental results show GSA2Seg achieving state-of-the-art AP and AP50 while maintaining competitive FPS, with ablations demonstrating the value of language prompts and attribute analysis for robustness and generalization in robotic waste management.

Abstract

Efficient waste management and recycling heavily rely on garbage exploration and identification. In this study, we propose GSA2Seg (Garbage Segmentation and Attribute Analysis), a novel visual approach that utilizes quadruped robotic dogs as autonomous agents to address waste management and recycling challenges in diverse indoor and outdoor environments. Equipped with advanced visual perception system, including visual sensors and instance segmentators, the robotic dogs adeptly navigate their surroundings, diligently searching for common garbage items. Inspired by open-vocabulary algorithms, we introduce an innovative method for object attribute analysis. By combining garbage segmentation and attribute analysis techniques, the robotic dogs accurately determine the state of the trash, including its position and placement properties. This information enhances the robotic arm's grasping capabilities, facilitating successful garbage retrieval. Additionally, we contribute an image dataset, named GSA2D, to support evaluation. Through extensive experiments on GSA2D, this paper provides a comprehensive analysis of GSA2Seg's effectiveness. Dataset available: \href{https://www.kaggle.com/datasets/hellob/gsa2d-2024}{https://www.kaggle.com/datasets/hellob/gsa2d-2024}.

Garbage Segmentation and Attribute Analysis by Robotic Dogs

TL;DR

This work introduces GSA2Seg, a cloud-based, vision–language–driven framework that performs garbage segmentation and attribute analysis to inform robotic grasping in diverse environments. By fusing visual features with language prompts via bi-directional attention in an encoder–decoder, GSA2Seg predicts precise masks, bounding boxes, and object attributes, enabling state-aware manipulation. The authors contribute the GSA2D dataset, comprising 3119 images across 10 garbage types with attributes such as placement state (standing/lying/deformation) and position (ground/platform), to benchmark open-vocabulary attribute-aware segmentation. Experimental results show GSA2Seg achieving state-of-the-art AP and AP50 while maintaining competitive FPS, with ablations demonstrating the value of language prompts and attribute analysis for robustness and generalization in robotic waste management.

Abstract

Efficient waste management and recycling heavily rely on garbage exploration and identification. In this study, we propose GSA2Seg (Garbage Segmentation and Attribute Analysis), a novel visual approach that utilizes quadruped robotic dogs as autonomous agents to address waste management and recycling challenges in diverse indoor and outdoor environments. Equipped with advanced visual perception system, including visual sensors and instance segmentators, the robotic dogs adeptly navigate their surroundings, diligently searching for common garbage items. Inspired by open-vocabulary algorithms, we introduce an innovative method for object attribute analysis. By combining garbage segmentation and attribute analysis techniques, the robotic dogs accurately determine the state of the trash, including its position and placement properties. This information enhances the robotic arm's grasping capabilities, facilitating successful garbage retrieval. Additionally, we contribute an image dataset, named GSA2D, to support evaluation. Through extensive experiments on GSA2D, this paper provides a comprehensive analysis of GSA2Seg's effectiveness. Dataset available: \href{https://www.kaggle.com/datasets/hellob/gsa2d-2024}{https://www.kaggle.com/datasets/hellob/gsa2d-2024}.
Paper Structure (10 sections, 3 figures, 4 tables)

This paper contains 10 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Garbage segmentation and attribute analysis. The robot dog utilizes visual sensors to gather environmental observation information, which is then processed by the cloud-based GSA2Seg segmenter. This cutting-edge technology enables the robot dog to perform instance segmentation and attribute analysis on garbage objects, providing accurate object masks and attributes for informed decision-making and control.
  • Figure 2: Pipeline of our GSA2Seg method. The robot dog utilizes its depth camera on the head to perceive visual signals, which are then transmitted to the cloud-based GSA2Seg system. GSA2Seg comprises an encoder and a decoder. The encoder processes the visual signals and language prompts separately and fuses them using a bi-directional attention mechanism before sending them to the decoder. Within the decoder, randomly initialized object queries are combined with visual features through attention mechanisms and used for predicting masks and bounding boxes. Additionally, these merged features are attentively combined with language features to facilitate attribute analysis.
  • Figure 3: Qualitative Results (zoom in for detailed viewing). The figure presents a subset of our visualization results, showcasing images captured from diverse perspectives in both indoor and outdoor environments using two cameras. Among them, the predicted attributes and categories of detected garbage are marked as labels above the masks and the bounding boxes.