Table of Contents
Fetching ...

Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots

Siva Krishna Ravipati, Ehsan Latif, Ramviyas Parasuraman, Suchendra M. Bhandarkar

TL;DR

This work tackles the challenge of material-aware perception for mobile robots by integrating RGB-D RGB-D data with SLAM to produce a 3D semantic map that jointly encodes object and material information. The authors propose a three-pillar approach: an object-detection pipeline (YOLOv5) to locate objects, a complementarity-aware fusion network (CAFN) for robust RGB-D material classification, and a voxel-based multiscale clustering framework (VOXM with MSCC) to propagate material labels into the 3D map generated by ORB-SLAM2. Experimental results on public RGB and RGB-D datasets, plus real-world robot deployments, show up to 15% improvement in material classification and 3D clustering accuracy over state-of-the-art baselines, with mean IoU ~0.8 and mAP ~0.65 in real deployments. The practical significance includes richer semantic maps for planning, interaction, and multi-robot collaboration, backed by open-source code and new RGB-D datasets.

Abstract

Classification of different object surface material types can play a significant role in the decision-making algorithms for mobile robots and autonomous vehicles. RGB-based scene-level semantic segmentation has been well-addressed in the literature. However, improving material recognition using the depth modality and its integration with SLAM algorithms for 3D semantic mapping could unlock new potential benefits in the robotics perception pipeline. To this end, we propose a complementarity-aware deep learning approach for RGB-D-based material classification built on top of an object-oriented pipeline. The approach further integrates the ORB-SLAM2 method for 3D scene mapping with multiscale clustering of the detected material semantics in the point cloud map generated by the visual SLAM algorithm. Extensive experimental results with existing public datasets and newly contributed real-world robot datasets demonstrate a significant improvement in material classification and 3D clustering accuracy compared to state-of-the-art approaches for 3D semantic scene mapping.

Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots

TL;DR

This work tackles the challenge of material-aware perception for mobile robots by integrating RGB-D RGB-D data with SLAM to produce a 3D semantic map that jointly encodes object and material information. The authors propose a three-pillar approach: an object-detection pipeline (YOLOv5) to locate objects, a complementarity-aware fusion network (CAFN) for robust RGB-D material classification, and a voxel-based multiscale clustering framework (VOXM with MSCC) to propagate material labels into the 3D map generated by ORB-SLAM2. Experimental results on public RGB and RGB-D datasets, plus real-world robot deployments, show up to 15% improvement in material classification and 3D clustering accuracy over state-of-the-art baselines, with mean IoU ~0.8 and mAP ~0.65 in real deployments. The practical significance includes richer semantic maps for planning, interaction, and multi-robot collaboration, backed by open-source code and new RGB-D datasets.

Abstract

Classification of different object surface material types can play a significant role in the decision-making algorithms for mobile robots and autonomous vehicles. RGB-based scene-level semantic segmentation has been well-addressed in the literature. However, improving material recognition using the depth modality and its integration with SLAM algorithms for 3D semantic mapping could unlock new potential benefits in the robotics perception pipeline. To this end, we propose a complementarity-aware deep learning approach for RGB-D-based material classification built on top of an object-oriented pipeline. The approach further integrates the ORB-SLAM2 method for 3D scene mapping with multiscale clustering of the detected material semantics in the point cloud map generated by the visual SLAM algorithm. Extensive experimental results with existing public datasets and newly contributed real-world robot datasets demonstrate a significant improvement in material classification and 3D clustering accuracy compared to state-of-the-art approaches for 3D semantic scene mapping.
Paper Structure (13 sections, 4 equations, 16 figures, 14 tables, 1 algorithm)

This paper contains 13 sections, 4 equations, 16 figures, 14 tables, 1 algorithm.

Figures (16)

  • Figure 1: Illustration of the objected-oriented 3D semantic mapping with material-level information (right) based on the RGB-D point clouds (left), shown along with the labels.
  • Figure 2: Architectural overview of the proposed object-oriented 3D semantic mapping with material labels. The visual SLAM ($\mathtt{VSLAM}$) component generates the point cloud map, and the YOLO component ($\mathtt{OBJ}$) detects objects and locates the bounding boxes of the objects in the images. The material classification network ($\mathtt{MCN}$) classifies the objects in the bounding boxes into different material classes. The voxel-based matching component ($\mathtt{VOXM}$) uses the point cloud map generated by visual SLAM and the material labels obtained from the material classification component to match the 3D coordinates of the bounding boxes with the 3D coordinates of the point cloud and propagate the material labels to the points in the point cloud.
  • Figure 3: Confusion Matrix of Material Classification Network
  • Figure 4: RGB Image, point cloud, and semantic material map comparison with hempel2022online (bottom colored point cloud) on two different sequences and their corresponding color representations in the TUM RGB-D dataset keimel2012tum: (a) fr2_ desk sequence, (b) fr3_ sitting_ xyz sequence. The colored labels of the respective material classes are shown in Fig. \ref{['fig:semantic_map']}.
  • Figure 5: Robot setup for the real-world mobile robot RGB-D experiments and dataset collection.
  • ...and 11 more figures