Table of Contents
Fetching ...

ForestSim: A Synthetic Benchmark for Intelligent Vehicle Perception in Unstructured Forest Environments

Pragat Wagle, Zheng Chen, Lantao Liu

Abstract

Robust scene understanding is essential for intelligent vehicles operating in natural, unstructured environments. While semantic segmentation datasets for structured urban driving are abundant, the datasets for extremely unstructured wild environments remain scarce due to the difficulty and cost of generating pixel-accurate annotations. These limitations hinder the development of perception systems needed for intelligent ground vehicles tasked with forestry automation, agricultural robotics, disaster response, and all-terrain mobility. To address this gap, we present ForestSim, a high-fidelity synthetic dataset designed for training and evaluating semantic segmentation models for intelligent vehicles in forested off-road and no-road environments. ForestSim contains 2094 photorealistic images across 25 diverse environments, covering multiple seasons, terrain types, and foliage densities. Using Unreal Engine environments integrated with Microsoft AirSim, we generate consistent, pixel-accurate labels across 20 classes relevant to autonomous navigation. We benchmark ForestSim using state-of-the-art architectures and report strong performance despite the inherent challenges of unstructured scenes. ForestSim provides a scalable and accessible foundation for perception research supporting the next generation of intelligent off-road vehicles. The dataset and code are publicly available: Dataset: https://vailforestsim.github.io Code: https://github.com/pragatwagle/ForestSim

ForestSim: A Synthetic Benchmark for Intelligent Vehicle Perception in Unstructured Forest Environments

Abstract

Robust scene understanding is essential for intelligent vehicles operating in natural, unstructured environments. While semantic segmentation datasets for structured urban driving are abundant, the datasets for extremely unstructured wild environments remain scarce due to the difficulty and cost of generating pixel-accurate annotations. These limitations hinder the development of perception systems needed for intelligent ground vehicles tasked with forestry automation, agricultural robotics, disaster response, and all-terrain mobility. To address this gap, we present ForestSim, a high-fidelity synthetic dataset designed for training and evaluating semantic segmentation models for intelligent vehicles in forested off-road and no-road environments. ForestSim contains 2094 photorealistic images across 25 diverse environments, covering multiple seasons, terrain types, and foliage densities. Using Unreal Engine environments integrated with Microsoft AirSim, we generate consistent, pixel-accurate labels across 20 classes relevant to autonomous navigation. We benchmark ForestSim using state-of-the-art architectures and report strong performance despite the inherent challenges of unstructured scenes. ForestSim provides a scalable and accessible foundation for perception research supporting the next generation of intelligent off-road vehicles. The dataset and code are publicly available: Dataset: https://vailforestsim.github.io Code: https://github.com/pragatwagle/ForestSim

Paper Structure

This paper contains 17 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: An example image in ForestSim dataset. The size, shapes, and ratios are similar to those in the real world and demonstrate characteristics of an unstructured environment where the edges of objects are challenging to discern and almost blend into one another.
  • Figure 2: Example RGB images of seasonal environments. These pictures demonstrate the unstructured, off-road, and forested characteristics of complex environments.
  • Figure 3: More dense environments, an example is seen on the left, required manual control. On the right, data was able to be collected programmatically with no manual control.
  • Figure 4: Examples of segmentation images captured directly from AirSim are on the left. These images were processed by manually determining the object each RGB value corresponds with and using this mapping to generate the ground truth pixelwise labels on the right.
  • Figure 5: Numbers of total pixels per class in the dataset in descending order.
  • ...and 3 more figures