Table of Contents
Fetching ...

Predicting Dynamic Map States from Limited Field-of-View Sensor Data

Knut Peterson, David Han

TL;DR

This work tackles predicting dynamic map states from limited field-of-view sensor data, a challenge for autonomous systems operating under occlusions or sensor failures. It introduces a cumulative dynamic sensor projection that encodes a time window of LIDAR observations into a single grayscale image, enabling the use of standard image-to-image translation models for map-state prediction. The method is validated in a 2D simulation across four obstacle/motion scenarios, using eight model variants and an ablation study showing the essential role of time-decay encoding. Results indicate high predictive performance, with dynamic scenarios revealing blur and probabilistic predictions as sensor data recency declines, underscoring the approach's potential to improve safety and reliability when full sensing is unavailable.

Abstract

When autonomous systems are deployed in real-world scenarios, sensors are often subject to limited field-of-view (FOV) constraints, either naturally through system design, or through unexpected occlusions or sensor failures. In conditions where a large FOV is unavailable, it is important to be able to infer information about the environment and predict the state of nearby surroundings based on available data to maintain safe and accurate operation. In this work, we explore the effectiveness of deep learning for dynamic map state prediction based on limited FOV time series data. We show that by representing dynamic sensor data in a simple single-image format that captures both spatial and temporal information, we can effectively use a wide variety of existing image-to-image learning models to predict map states with high accuracy in a diverse set of sensing scenarios.

Predicting Dynamic Map States from Limited Field-of-View Sensor Data

TL;DR

This work tackles predicting dynamic map states from limited field-of-view sensor data, a challenge for autonomous systems operating under occlusions or sensor failures. It introduces a cumulative dynamic sensor projection that encodes a time window of LIDAR observations into a single grayscale image, enabling the use of standard image-to-image translation models for map-state prediction. The method is validated in a 2D simulation across four obstacle/motion scenarios, using eight model variants and an ablation study showing the essential role of time-decay encoding. Results indicate high predictive performance, with dynamic scenarios revealing blur and probabilistic predictions as sensor data recency declines, underscoring the approach's potential to improve safety and reliability when full sensing is unavailable.

Abstract

When autonomous systems are deployed in real-world scenarios, sensors are often subject to limited field-of-view (FOV) constraints, either naturally through system design, or through unexpected occlusions or sensor failures. In conditions where a large FOV is unavailable, it is important to be able to infer information about the environment and predict the state of nearby surroundings based on available data to maintain safe and accurate operation. In this work, we explore the effectiveness of deep learning for dynamic map state prediction based on limited FOV time series data. We show that by representing dynamic sensor data in a simple single-image format that captures both spatial and temporal information, we can effectively use a wide variety of existing image-to-image learning models to predict map states with high accuracy in a diverse set of sensing scenarios.
Paper Structure (15 sections, 3 equations, 4 figures, 2 tables)

This paper contains 15 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: During data collection, the robot operates in a 2D world of static (exp1 and exp2) or dynamic (exp3 and exp4) obstacles, and gathers data with a LIDAR sensor by rotating (exp1 and exp3) or moving around the world in a square shape (exp2 and exp4). Each run consists of 100 time steps. For each time step the LIDAR scan data and robot pose are recorded, and the map state image is saved at the end of each run.
  • Figure 2: An overview of our prediction method. A time window of collected LIDAR sensor data and robot pose information is first transformed into a single image using our proposed method of cumulative dynamic sensor projection. That image is then used as the input to an image-to-image prediction model which predicts a final map state based on the sensor input, and the predicted map is compared to the ground truth map state for training and evaluation.
  • Figure 3: By representing limited FOV LIDAR scan data as time-decaying gray scale pixel intensities, we capture detailed information of obstacle dynamics and sensing time frame. In experiments with static obstacles (exp1 and exp2) we see detailed obstacle location outlines, with decaying pixel intensities based on when they were last sensed. In experiments with dynamic obstacles (exp3 and exp4) we see object motion recorded through pixel intensity gradients, which simultaneously record both obstacle speed and location information.
  • Figure 4: Example model results from the U-Net model using the mit_b0 backbone for all four experimental setups. Model predictions for experiments with static obstacles (exp. 1 and exp. 2) are quite precise, but miss a few predictions due to occlusions. Model predictions for experiments with dynamic obstacles (exp. 3 and exp. 4) are less precise, with obstacle locations becoming blurry and probabilistic as sensor values become older and less reliable, or as obstacle movement produces additional occlusions. For the ablation results, we see that removing the gray-scale time decay of sensor information negatively impacts the predicted map states.