Table of Contents
Fetching ...

Person Segmentation and Action Classification for Multi-Channel Hemisphere Field of View LiDAR Sensors

Svetlana Seliunina, Artem Otelepko, Raphael Memmesheimer, Sven Behnke

TL;DR

This paper proposes a method based on a MaskDINO model to detect and segment persons and to recognize their actions from combined spherical projected multi-channel representations of the LiDAR data with an additional positional encoding.

Abstract

Robots need to perceive persons in their surroundings for safety and to interact with them. In this paper, we present a person segmentation and action classification approach that operates on 3D scans of hemisphere field of view LiDAR sensors. We recorded a data set with an Ouster OSDome-64 sensor consisting of scenes where persons perform three different actions and annotated it. We propose a method based on a MaskDINO model to detect and segment persons and to recognize their actions from combined spherical projected multi-channel representations of the LiDAR data with an additional positional encoding. Our approach demonstrates good performance for the person segmentation task and further performs well for the estimation of the person action states walking, waving, and sitting. An ablation study provides insights about the individual channel contributions for the person segmentation task. The trained models, code and dataset are made publicly available.

Person Segmentation and Action Classification for Multi-Channel Hemisphere Field of View LiDAR Sensors

TL;DR

This paper proposes a method based on a MaskDINO model to detect and segment persons and to recognize their actions from combined spherical projected multi-channel representations of the LiDAR data with an additional positional encoding.

Abstract

Robots need to perceive persons in their surroundings for safety and to interact with them. In this paper, we present a person segmentation and action classification approach that operates on 3D scans of hemisphere field of view LiDAR sensors. We recorded a data set with an Ouster OSDome-64 sensor consisting of scenes where persons perform three different actions and annotated it. We propose a method based on a MaskDINO model to detect and segment persons and to recognize their actions from combined spherical projected multi-channel representations of the LiDAR data with an additional positional encoding. Our approach demonstrates good performance for the person segmentation task and further performs well for the estimation of the person action states walking, waving, and sitting. An ablation study provides insights about the individual channel contributions for the person segmentation task. The trained models, code and dataset are made publicly available.

Paper Structure

This paper contains 12 sections, 2 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: We segment persons and classify their actions from spherical projected 2D representations of multi-channel hemisphere fov lidar.
  • Figure 2: The dataset collection robot setup.
  • Figure 3: Example data from the individual Ouster OSDome-64 channels.
  • Figure 4: Person segmentation examples with ground truth masks (blue).
  • Figure 5: Action recognition examples with ground truth mask (blue).
  • ...and 4 more figures