Table of Contents
Fetching ...

FROG: A new people detection dataset for knee-high 2D range finders

Fernando Amodeo, Noé Pérez-Higueras, Luis Merino, Fernando Caballero

TL;DR

FROG addresses the challenge of detecting humans with knee-high 2D LiDAR by releasing a fully annotated, diverse 2D LiDAR dataset collected in a public space, along with a fast end-to-end detector and a benchmarking suite. It introduces two deep networks, Laser Feature Extractor (LFE) and People Proposal Network (PPN), that operate directly on raw scans, enabling high-speed ROS inference and reducing reliance on hand-crafted preprocessing. The paper benchmarks several detectors (DROW3, DR-SPAAM, PeTra) against the proposed methods, showing competitive accuracy with notably faster inference for LFE/PPN, and discusses annotation tooling, data formats, and evaluation methodology to standardize 2D LiDAR-based people detection research. Overall, FROG advances practical human detection for mobile robots using 2D LiDAR and provides a reusable framework for future improvements and extensions, including self-supervised learning and sensor fusion approaches.

Abstract

Mobile robots require knowledge of the environment, especially of humans located in its vicinity. While the most common approaches for detecting humans involve computer vision, an often overlooked hardware feature of robots for people detection are their 2D range finders. These were originally intended for obstacle avoidance and mapping/SLAM tasks. In most robots, they are conveniently located at a height approximately between the ankle and the knee, so they can be used for detecting people too, and with a larger field of view and depth resolution compared to cameras. In this paper, we present a new dataset for people detection using knee-high 2D range finders called FROG. This dataset has greater laser resolution, scanning frequency, and more complete annotation data compared to existing datasets such as DROW. Particularly, the FROG dataset contains annotations for 100% of its laser scans (unlike DROW which only annotates 5%), 17x more annotated scans, 100x more people annotations, and over twice the distance traveled by the robot. We propose a benchmark based on the FROG dataset, and analyze a collection of state-of-the-art people detectors based on 2D range finder data. We also propose and evaluate a new end-to-end deep learning approach for people detection. Our solution works with the raw sensor data directly (not needing hand-crafted input data features), thus avoiding CPU preprocessing and releasing the developer of understanding specific domain heuristics. Experimental results show how the proposed people detector attains results comparable to the state of the art, while an optimized implementation for ROS can operate at more than 500 Hz.

FROG: A new people detection dataset for knee-high 2D range finders

TL;DR

FROG addresses the challenge of detecting humans with knee-high 2D LiDAR by releasing a fully annotated, diverse 2D LiDAR dataset collected in a public space, along with a fast end-to-end detector and a benchmarking suite. It introduces two deep networks, Laser Feature Extractor (LFE) and People Proposal Network (PPN), that operate directly on raw scans, enabling high-speed ROS inference and reducing reliance on hand-crafted preprocessing. The paper benchmarks several detectors (DROW3, DR-SPAAM, PeTra) against the proposed methods, showing competitive accuracy with notably faster inference for LFE/PPN, and discusses annotation tooling, data formats, and evaluation methodology to standardize 2D LiDAR-based people detection research. Overall, FROG advances practical human detection for mobile robots using 2D LiDAR and provides a reusable framework for future improvements and extensions, including self-supervised learning and sensor fusion approaches.

Abstract

Mobile robots require knowledge of the environment, especially of humans located in its vicinity. While the most common approaches for detecting humans involve computer vision, an often overlooked hardware feature of robots for people detection are their 2D range finders. These were originally intended for obstacle avoidance and mapping/SLAM tasks. In most robots, they are conveniently located at a height approximately between the ankle and the knee, so they can be used for detecting people too, and with a larger field of view and depth resolution compared to cameras. In this paper, we present a new dataset for people detection using knee-high 2D range finders called FROG. This dataset has greater laser resolution, scanning frequency, and more complete annotation data compared to existing datasets such as DROW. Particularly, the FROG dataset contains annotations for 100% of its laser scans (unlike DROW which only annotates 5%), 17x more annotated scans, 100x more people annotations, and over twice the distance traveled by the robot. We propose a benchmark based on the FROG dataset, and analyze a collection of state-of-the-art people detectors based on 2D range finder data. We also propose and evaluate a new end-to-end deep learning approach for people detection. Our solution works with the raw sensor data directly (not needing hand-crafted input data features), thus avoiding CPU preprocessing and releasing the developer of understanding specific domain heuristics. Experimental results show how the proposed people detector attains results comparable to the state of the art, while an optimized implementation for ROS can operate at more than 500 Hz.
Paper Structure (21 sections, 1 equation, 9 figures, 4 tables)

This paper contains 21 sections, 1 equation, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Left: image of the robot platform used for recording. Right: reference frames of the robot. The front mounted 2D LiDAR sensor (laserfront) is placed at X = 0.22 m and Z = 0.33 m with respect to the base of the robot (base_link).
  • Figure 2: Example navigation plan used by the robot during capture of the FROG dataset.
  • Figure 3: Main interface of the laser scan labeling tool. The tool displays the laser scan and the video feed from a camera topic side by side, and allows the user to easily create and track annotations using the mouse.
  • Figure 4: Example annotated laser scan showing the coordinate system used in the FROG dataset, matching the standard conventions used in robotics. The distances shown are in meters. Blue dots: points from the scan. Green circles: annotated people.
  • Figure 5: Laser Feature Extractor (LFE) network architecture, applied to a segmentation task. Each 1D convolutional block consists of three consecutive depthwise separable separable2017 1D convolutions of different kernel sizes (9, 7 and 5 respectively). Some blocks also contain a global feature aggregator, which performs a global maxpool of the input and concatenates the resulting features to each individual position of the input. Finally, a residual path adds the input of the block to the output of the last convolution. The segmentation mask is generated by an "inverse" LFE similar to U-Net unet2015 followed by a pointwise convolution that produces the final output logits.
  • ...and 4 more figures