Dynamics Modeling using Visual Terrain Features for High-Speed Autonomous Off-Road Driving
Jason Gibson, Anoushka Alavilli, Erica Tevere, Evangelos A. Theodorou, Patrick Spieler
TL;DR
The paper addresses real-time terradynamics forecasting for high-speed autonomous off-road driving by integrating visual terrain features into a hybrid physics-based neural dynamics model. It leverages a DINOv2 visual foundation model to extract terrain-informed features, compresses them with an end-to-end encoder, and maps them into a lightweight 2D terrain feature map used by an MPC-driven planner. A distance-robust training regimen, including distance-independent compression and multiple feature-projection distances, enables reliable dynamics predictions across varying sensing ranges. Validated on a large RACER dataset spanning diverse rugged terrains, the approach yields roughly 10% improvements in predictive accuracy with modest computational overhead, supporting safer and more capable autonomous off-road navigation.
Abstract
Rapid autonomous traversal of unstructured terrain is essential for scenarios such as disaster response, search and rescue, or planetary exploration. As a vehicle navigates at the limit of its capabilities over extreme terrain, its dynamics can change suddenly and dramatically. For example, high-speed and varying terrain can affect parameters such as traction, tire slip, and rolling resistance. To achieve effective planning in such environments, it is crucial to have a dynamics model that can accurately anticipate these conditions. In this work, we present a hybrid model that predicts the changing dynamics induced by the terrain as a function of visual inputs. We leverage a pre-trained visual foundation model (VFM) DINOv2, which provides rich features that encode fine-grained semantic information. To use this dynamics model for planning, we propose an end-to-end training architecture for a projection distance independent feature encoder that compresses the information from the VFM, enabling the creation of a lightweight map of the environment at runtime. We validate our architecture on an extensive dataset (hundreds of kilometers of aggressive off-road driving) collected across multiple locations as part of the DARPA Robotic Autonomy in Complex Environments with Resiliency (RACER) program. https://www.youtube.com/watch?v=dycTXxEosMk
