Table of Contents
Fetching ...

GrandTour: A Legged Robotics Dataset in the Wild for Multi-Modal Perception and State Estimation

Jonas Frey, Turcan Tuna, Frank Fu, Katharine Patterson, Tianao Xu, Maurice Fallon, Cesar Cadena, Marco Hutter

TL;DR

GrandTour tackles the lack of public, real-world, multi-modal data for legged robotics by introducing a large-scale, multi-sensor dataset collected on ANYmal-D with the Boxi payload. It provides 49 missions across indoor, urban, and natural environments with synchronized LiDARs, cameras, depth sensors, IMUs, and dual RTK-GNSS, complemented by centimeter- to millimeter-level ground truth from satellite references and a Leica total station. The work also details calibration, data formats (Zarr/JPEG and ROS bags), and post-processed outputs, and performs an extensive benchmark of 52 open-source pipelines across six missions to illuminate strengths and weaknesses of LO, LIO/LIVO, multi-LiDAR, and VIO approaches. Beyond benchmarking, GrandTour supports perception, sim-to-real transfer, and navigation research, enabling robust, real-world development for legged autonomy and serving as a long-term, open benchmark for the field.

Abstract

Accurate state estimation and multi-modal perception are prerequisites for autonomous legged robots in complex, large-scale environments. To date, no large-scale public legged-robot dataset captures the real-world conditions needed to develop and benchmark algorithms for legged-robot state estimation, perception, and navigation. To address this, we introduce the GrandTour dataset, a multi-modal legged-robotics dataset collected across challenging outdoor and indoor environments, featuring an ANYbotics ANYmal-D quadruped equipped with the \boxi multi-modal sensor payload. GrandTour spans a broad range of environments and operational scenarios across distinct test sites, ranging from alpine scenery and forests to demolished buildings and urban areas, and covers a wide variation in scale, complexity, illumination, and weather conditions. The dataset provides time-synchronized sensor data from spinning LiDARs, multiple RGB cameras with complementary characteristics, proprioceptive sensors, and stereo depth cameras. Moreover, it includes high-precision ground-truth trajectories from satellite-based RTK-GNSS and a Leica Geosystems total station. This dataset supports research in SLAM, high-precision state estimation, and multi-modal learning, enabling rigorous evaluation and development of new approaches to sensor fusion in legged robotic systems. With its extensive scope, GrandTour represents the largest open-access legged-robotics dataset to date. The dataset is available at https://grand-tour.leggedrobotics.com, on HuggingFace (ROS-independent), and in ROS formats, along with tools and demo resources.

GrandTour: A Legged Robotics Dataset in the Wild for Multi-Modal Perception and State Estimation

TL;DR

GrandTour tackles the lack of public, real-world, multi-modal data for legged robotics by introducing a large-scale, multi-sensor dataset collected on ANYmal-D with the Boxi payload. It provides 49 missions across indoor, urban, and natural environments with synchronized LiDARs, cameras, depth sensors, IMUs, and dual RTK-GNSS, complemented by centimeter- to millimeter-level ground truth from satellite references and a Leica total station. The work also details calibration, data formats (Zarr/JPEG and ROS bags), and post-processed outputs, and performs an extensive benchmark of 52 open-source pipelines across six missions to illuminate strengths and weaknesses of LO, LIO/LIVO, multi-LiDAR, and VIO approaches. Beyond benchmarking, GrandTour supports perception, sim-to-real transfer, and navigation research, enabling robust, real-world development for legged autonomy and serving as a long-term, open benchmark for the field.

Abstract

Accurate state estimation and multi-modal perception are prerequisites for autonomous legged robots in complex, large-scale environments. To date, no large-scale public legged-robot dataset captures the real-world conditions needed to develop and benchmark algorithms for legged-robot state estimation, perception, and navigation. To address this, we introduce the GrandTour dataset, a multi-modal legged-robotics dataset collected across challenging outdoor and indoor environments, featuring an ANYbotics ANYmal-D quadruped equipped with the \boxi multi-modal sensor payload. GrandTour spans a broad range of environments and operational scenarios across distinct test sites, ranging from alpine scenery and forests to demolished buildings and urban areas, and covers a wide variation in scale, complexity, illumination, and weather conditions. The dataset provides time-synchronized sensor data from spinning LiDARs, multiple RGB cameras with complementary characteristics, proprioceptive sensors, and stereo depth cameras. Moreover, it includes high-precision ground-truth trajectories from satellite-based RTK-GNSS and a Leica Geosystems total station. This dataset supports research in SLAM, high-precision state estimation, and multi-modal learning, enabling rigorous evaluation and development of new approaches to sensor fusion in legged robotic systems. With its extensive scope, GrandTour represents the largest open-access legged-robotics dataset to date. The dataset is available at https://grand-tour.leggedrobotics.com, on HuggingFace (ROS-independent), and in ROS formats, along with tools and demo resources.
Paper Structure (40 sections, 3 equations, 12 figures, 12 tables)

This paper contains 40 sections, 3 equations, 12 figures, 12 tables.

Figures (12)

  • Figure 1: GrandTour dataset preview. Top: views of ANYmal traversing diverse environments during GrandTour, inset summarizes sensor suite and dataset scale (49 missions, $>$10km, $>$5h). Bottom: aerial imagery alongside on-board RGB views from a subset of the cameras for six missions of GrandTour (ETH-1, PIL-1, EIG-1, SPX-2, HEAP-1, ARC-2). The path color indicates the frequency with which the robot is at that location.
  • Figure 2: Sensor placement visualization of the entire GrandTour sensor suite. A) Shows the spatial relationships of the sensors on the Boxi payload, and B) shows the sensors of the ANYmal base platform components. Each colored axis represents a sensor or joint frame. For brevity, repeated sensors are not shown (e.g., 6$\times$ Intel RealSense D435i depth cameras).
  • Figure 3: System architecture and sensor interfaces of the combined Boxi–ANYmal platform. All compute units (Jetson AGX Orin, Intel NUC, Raspberry Pi) are connected to the UbiSwitch Ethernet device, and the sensors are connected to their respective compute platforms via various interfaces, such as USB 3.1, GMSL2, RJ45, and I²C.
  • Figure 4: Top-down projection of range observations around the robot in GrandTour (scale bar: 3m), illustrating the complementary coverage of the sensor suite. A) Combined view with color-coding: Hesai LiDAR (red), Livox LiDAR (blue), VLP16 (green), and ANYmal-Depth cameras (yellow). B)--D) For clarity, the sensor(s) of interest are shown in color while all other measurements are shown in gray. B)Hesai LiDAR. C)Livox LiDAR. D) ANYmal-mounted sensors: VLP16 and ANYmal-Depth.
  • Figure 5: The calibration provided with the GrandTour dataset is validated through the overlay of point clouds from different LiDARs onto all available RGB images of GrandTour. As shown, the point clouds align with the correct visual features in the image. Points are colorized by their depth along the camera axis; red indicates closer points, and blue indicates farther points.
  • ...and 7 more figures