Table of Contents
Fetching ...

MineInsight: A Multi-sensor Dataset for Humanitarian Demining Robotics in Off-Road Environments

Mario Malizia, Charles Hamesse, Ken Hasselmann, Geert De Cubber, Nikolaos Tsiogkas, Eric Demeester, Rob Haelterman

TL;DR

MineInsight tackles the scarcity of realistic, multi-modal datasets for off-road humanitarian demining by providing a public, multi-sensor, multi-spectral dataset with dual UGV and robotic-arm viewpoints, dual LiDARs, and data from RGB/monochrome, VIS-SWIR, and LWIR channels. It covers three vegetation-rich tracks with 35 targets (15 landmines, 20 distractors) captured over about one hour, incorporating automated bounding boxes refined by humans and minute climatology data. A key novelty is the targetless camera–LiDAR calibration and dual-view data fusion that mitigates occlusions, enabling robust evaluation of detection algorithms in cluttered environments. The dataset serves as a benchmark for domain adaptation and multi-modal fusion in realistic demining contexts while acknowledging seasonal and environmental domain gaps and annotation limitations.

Abstract

The use of robotics in humanitarian demining increasingly involves computer vision techniques to improve landmine detection capabilities. However, in the absence of diverse and realistic datasets, the reliable validation of algorithms remains a challenge for the research community. In this paper, we introduce MineInsight, a publicly available multi-sensor, multi-spectral dataset designed for off-road landmine detection. The dataset features 35 different targets (15 landmines and 20 commonly found objects) distributed along three distinct tracks, providing a diverse and realistic testing environment. MineInsight is, to the best of our knowledge, the first dataset to integrate dual-view sensor scans from both an Unmanned Ground Vehicle and its robotic arm, offering multiple viewpoints to mitigate occlusions and improve spatial awareness. It features two LiDARs, as well as images captured at diverse spectral ranges, including visible (RGB, monochrome), visible short-wave infrared (VIS-SWIR), and long-wave infrared (LWIR). Additionally, the dataset provides bounding boxes generated by an automated pipeline and refined with human supervision. We recorded approximately one hour of data in both daylight and nighttime conditions, resulting in around 38,000 RGB frames, 53,000 VIS-SWIR frames, and 108,000 LWIR frames. MineInsight serves as a benchmark for developing and evaluating landmine detection algorithms. Our dataset is available at https://github.com/mariomlz99/MineInsight.

MineInsight: A Multi-sensor Dataset for Humanitarian Demining Robotics in Off-Road Environments

TL;DR

MineInsight tackles the scarcity of realistic, multi-modal datasets for off-road humanitarian demining by providing a public, multi-sensor, multi-spectral dataset with dual UGV and robotic-arm viewpoints, dual LiDARs, and data from RGB/monochrome, VIS-SWIR, and LWIR channels. It covers three vegetation-rich tracks with 35 targets (15 landmines, 20 distractors) captured over about one hour, incorporating automated bounding boxes refined by humans and minute climatology data. A key novelty is the targetless camera–LiDAR calibration and dual-view data fusion that mitigates occlusions, enabling robust evaluation of detection algorithms in cluttered environments. The dataset serves as a benchmark for domain adaptation and multi-modal fusion in realistic demining contexts while acknowledging seasonal and environmental domain gaps and annotation limitations.

Abstract

The use of robotics in humanitarian demining increasingly involves computer vision techniques to improve landmine detection capabilities. However, in the absence of diverse and realistic datasets, the reliable validation of algorithms remains a challenge for the research community. In this paper, we introduce MineInsight, a publicly available multi-sensor, multi-spectral dataset designed for off-road landmine detection. The dataset features 35 different targets (15 landmines and 20 commonly found objects) distributed along three distinct tracks, providing a diverse and realistic testing environment. MineInsight is, to the best of our knowledge, the first dataset to integrate dual-view sensor scans from both an Unmanned Ground Vehicle and its robotic arm, offering multiple viewpoints to mitigate occlusions and improve spatial awareness. It features two LiDARs, as well as images captured at diverse spectral ranges, including visible (RGB, monochrome), visible short-wave infrared (VIS-SWIR), and long-wave infrared (LWIR). Additionally, the dataset provides bounding boxes generated by an automated pipeline and refined with human supervision. We recorded approximately one hour of data in both daylight and nighttime conditions, resulting in around 38,000 RGB frames, 53,000 VIS-SWIR frames, and 108,000 LWIR frames. MineInsight serves as a benchmark for developing and evaluating landmine detection algorithms. Our dataset is available at https://github.com/mariomlz99/MineInsight.

Paper Structure

This paper contains 15 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Illustration of the different sensors mounted on the UGV. Left - The UGV is ready for data collection. Top right - Sensors installed on the robotic arm. Bottom right: Sensors mounted directly on the mobile base.
  • Figure 2: Overview of all the objects included in the dataset. The objects on the left side of the black line are landmines (L), while the objects on the right side of the black line are common items (C). For what concerns the landmines (L) - L1: PFM-1 "Butterfly", L2: PMN, L3: TC-3.6, L4: M35, L5: C-3 "Elsie", L6: M6 (grey), L7: TMA-2, L8: MON-50, L9: MON-90, L10: M6 (blue), L11: TMM-1, L12: Type 72 (P), L13: TM-46, L14: PROM-1, L15: VS-50. For what concerns the common items (C) - C1: Soda metal can, C2: Disposable paper cup, C3: Sponge, C4: Plastic charger, C5: Metal pot, C6: Glass vinegar bottle, C7: Plastic water bottle (significantly crumpled), C8: Plastic shampoo bottle, C9: Plastic water bottle (partially crumpled), C10: Plastic water bottle (slightly crumpled), C11: Glass pepper dispenser, C12: Glass jar (grey cover), C13: Plastic chips bag (slightly crumpled), C14: Metal coke can, C15: Metal tuna can, C16: Glass beer bottle, C17: Glass jar (green cover), C18: Metal corn tin, C19: Plastic chips bag (significantly crumpled), C20: Plastic cup.
  • Figure 3: Overview of the three tracks in the dataset. The first row displays an image of each track to show the condition of the vegetation. The second row presents a top-view point cloud representation, highlighting the locations of targets and illustrating their distribution across the tracks. Scale: 1m (bottom left).
  • Figure 4: Left - Arm stow and UGV motion: The manipulator is positioned relative to the UGV’s body, with the vehicle’s forward motion direction indicated by the red arrow. Center - End effector position in the first sequence: The arm is extended to show the initial placement and orientation of the end effector. Right - End effector position in the second sequence: The rotations of the arm’s wrist joints are highlighted. The angles $\theta_1$ and $\theta_2$ correspond to the rotational movements of wrist joint 1 and wrist joint 2, respectively.
  • Figure 5: Left - The track is ready to sample the reference sequence. Right - A MON-50 landmine and its corresponding AprilTag. The pose of each AprilTag, denoted as $A_i$, is evaluated, allowing determination of the position of the corresponding target $T_i$.
  • ...and 1 more figures