Table of Contents
Fetching ...

Experimental investigation of pose informed reinforcement learning for skid-steered visual navigation

Ameya Salvi, Venkat Krovi

TL;DR

This work tackles vision-based lane keeping for skid-steered robots by introducing a pose-informed DRL framework that uses waypoint-guided learning with arc-length clothoid paths. It formalizes the problem as an MDP with a structured track, a clothoid-based reference path, and a reward that jointly penalizes pose, velocity, and action effort; learning can be conducted in IK or end-to-end modes. Through extensive simulations and hardware experiments, it demonstrates that waypoint-guided learning yields robust sim-to-real transfer, outperforms several formal methods on high-curvature tracks, and generalizes beyond cone markers to unseen visual features. The study also analyzes the effects of waypoint spacing, look-ahead adjustments, and sensor dropouts, offering practical insights for deploying pose-informed policies in real autonomy stacks.

Abstract

Vision-based lane keeping is a topic of significant interest in the robotics and autonomous ground vehicles communities in various on-road and off-road applications. The skid-steered vehicle architecture has served as a useful vehicle platform for human controlled operations. However, systematic modeling, especially of the skid-slip wheel terrain interactions (primarily in off-road settings) has created bottlenecks for automation deployment. End-to-end learning based methods such as imitation learning and deep reinforcement learning, have gained prominence as a viable deployment option to counter the lack of accurate analytical models. However, the systematic formulation and subsequent verification/validation in dynamic operation regimes (particularly for skid-steered vehicles) remains a work in progress. To this end, a novel approach for structured formulation for learning visual navigation is proposed and investigated in this work. Extensive software simulations, hardware evaluations and ablation studies now highlight the significantly improved performance of the proposed approach against contemporary literature.

Experimental investigation of pose informed reinforcement learning for skid-steered visual navigation

TL;DR

This work tackles vision-based lane keeping for skid-steered robots by introducing a pose-informed DRL framework that uses waypoint-guided learning with arc-length clothoid paths. It formalizes the problem as an MDP with a structured track, a clothoid-based reference path, and a reward that jointly penalizes pose, velocity, and action effort; learning can be conducted in IK or end-to-end modes. Through extensive simulations and hardware experiments, it demonstrates that waypoint-guided learning yields robust sim-to-real transfer, outperforms several formal methods on high-curvature tracks, and generalizes beyond cone markers to unseen visual features. The study also analyzes the effects of waypoint spacing, look-ahead adjustments, and sensor dropouts, offering practical insights for deploying pose-informed policies in real autonomy stacks.

Abstract

Vision-based lane keeping is a topic of significant interest in the robotics and autonomous ground vehicles communities in various on-road and off-road applications. The skid-steered vehicle architecture has served as a useful vehicle platform for human controlled operations. However, systematic modeling, especially of the skid-slip wheel terrain interactions (primarily in off-road settings) has created bottlenecks for automation deployment. End-to-end learning based methods such as imitation learning and deep reinforcement learning, have gained prominence as a viable deployment option to counter the lack of accurate analytical models. However, the systematic formulation and subsequent verification/validation in dynamic operation regimes (particularly for skid-steered vehicles) remains a work in progress. To this end, a novel approach for structured formulation for learning visual navigation is proposed and investigated in this work. Extensive software simulations, hardware evaluations and ablation studies now highlight the significantly improved performance of the proposed approach against contemporary literature.

Paper Structure

This paper contains 24 sections, 13 equations, 17 figures, 8 tables, 1 algorithm.

Figures (17)

  • Figure 1: (Top Row) Several scenarios where lanes are either created or realized by sparse visual markers : (a) Rows of crops, (b) Cones for lane modifications, (c) Trees in vineyards, and, (d) radium road markers . (Bottom Row) A typical vision based path following framework involves (i) capturing and processing vision data which allows the extraction of meaningful environmental features, (ii) associating those features in context to the relevant robot control task, and, (iii) planning and execution of the control task based on the formerly determined association.
  • Figure 2: Schematic overview of the visual navigation problem in the form of driving withing lane markers.(A) Left : The deployment phase in which preview images are simplified to distill lane features, serving as input to the control policy that generates reference velocities in the robot body frame. The reference velocities are tracked using an inverse kinematics model, tuned from physical data. (B) Right : The back-end learning process which involves mapping the input images to geometric error between lane centre and robot centre (discussed in detail in section \ref{['subsec:guided_learning']})
  • Figure 3: (a) Two arbitrary closed-loop curves. (b) The width of the lane is roughly 2.5 times the robot track width, and each cone is randomized to be located within a radius of 0.15 m from the specified center.
  • Figure 4: Comparison of $25$ policies learnt for reference tracking velocities ${}^{B}V_{d} \in \{0.75,0.6,0.45,0.3,0.15 \}$ m/s with waypoints sampled at distances $ds \in \{1, 0.75, 0.5, 0.25, 0.1 \}$ m. (a) Path tracking error $e_{x}$. (b) Velocity tracking error $e_{V}$. (c) Distance normalized cumulative tracking error $\mathbf{N}$.
  • Figure 5: Reward convergence failure for reference tracking velocity of ${}^{B}V_{d} = 0.15$ m/s
  • ...and 12 more figures