iPolicy: Incremental Policy Algorithms for Feedback Motion Planning

Guoxiang Zhao; Devesh K. Jha; Yebin Wang; Minghui Zhu

iPolicy: Incremental Policy Algorithms for Feedback Motion Planning

Guoxiang Zhao, Devesh K. Jha, Yebin Wang, Minghui Zhu

TL;DR

This work develops iPolicy, an incremental, policy-based motion-planning framework that couples sampling-based graph construction with set-valued dynamic programming to synthesize feedback controllers for dynamical robots. By transforming the minimal travel time into a Kruzhkov-transformed value function $\varTheta$, iPolicy performs asynchronous value iterations on an expanding graph to converge to $\varTheta^*$ with probability one, leveraging contraction properties over both fixed and evolving graphs. Theoretical analysis establishes convergence under specific resolution and scheduling assumptions, and experiments on a point-mass, a simple car, and a Dubins car demonstrate effective, anytime improvement and robustness to obstacles. A computation-saving variant and extensive numerical results illustrate practical scalability and the potential for acceleration via parallelism and learning-based techniques in future work.

Abstract

This paper presents policy-based motion planning for robotic systems. The motion planning literature has been mostly focused on open-loop trajectory planning which is followed by tracking online. In contrast, we solve the problem of path planning and controller synthesis simultaneously by solving the related feedback control problem. We present a novel incremental policy (iPolicy) algorithm for motion planning, which integrates sampling-based methods and set-valued optimal control methods to compute feedback controllers for the robotic system. In particular, we use sampling to incrementally construct the state space of the system. Asynchronous value iterations are performed on the sampled state space to synthesize the incremental policy feedback controller. We show the convergence of the estimates to the optimal value function in continuous state space. Numerical results with various different dynamical systems (including nonholonomic systems) verify the optimality and effectiveness of iPolicy.

iPolicy: Incremental Policy Algorithms for Feedback Motion Planning

TL;DR

, iPolicy performs asynchronous value iterations on an expanding graph to converge to

with probability one, leveraging contraction properties over both fixed and evolving graphs. Theoretical analysis establishes convergence under specific resolution and scheduling assumptions, and experiments on a point-mass, a simple car, and a Dubins car demonstrate effective, anytime improvement and robustness to obstacles. A computation-saving variant and extensive numerical results illustrate practical scalability and the potential for acceleration via parallelism and learning-based techniques in future work.

Abstract

Paper Structure (18 sections, 66 equations, 5 figures, 4 algorithms)

This paper contains 18 sections, 66 equations, 5 figures, 4 algorithms.

Introduction
Related Work
Notations and notions
Problem Formulation
The Incremental Policy (iPolicy) Algorithm
Algorithm statement
Performance guarantee
Analysis
Preliminaries
Contraction property on a fixed graph
Asynchronous contraction over graphs
Proof of Theorem \ref{['theorem:contractiveOverPeriods']}
Numerical Results and Discussion
Computation-saving query
Point mass
...and 3 more sections

Figures (5)

Figure 1: Illustration of the approximation of dynamic and BackProp. Orange dots are sampled states, orange crosses are resting states when applying constant control $u$ for time $\epsilon$ and blue arrows imply discrete time transition.
Figure 2: The estimated value function obtained for a point mass in the presence of obstacles and its convergence with the computational time. Goal region for the point-mass is centered at $(0,0)$ with red dash line. The colorbar in the plots represent the approximate time to the goal region. Figure \ref{['fig:point-mass:error']} shows the errors over $5$ independent runs. As seen in Figure \ref{['fig:point-mass:error']}, iPolicy achieves faster convergence compared to the multigrid method.
Figure 3: The estimated value function obtained by the proposed incremental algorithm for simple car obtained over $5,000$ samples with orientation $\theta=0$.
Figure 4: Trajectory of the simple car accomplishing automated parking in the cluttered environment. Yellow arrows denote the orientation of the car while red stars denote the center. Goal region is marked by a pink dash circle.
Figure 5: The estimated value function obtained by iPolicy for Dubins car system obtained over $7500$ samples. As the Dubins car can't move backwards, the value functions are discontinuous and it results in more complex reachable sets in the sub-Riemannian manifold.

Theorems & Definitions (21)

proof
proof
proof
proof
proof
proof
proof
proof
proof
proof
...and 11 more

iPolicy: Incremental Policy Algorithms for Feedback Motion Planning

TL;DR

Abstract

iPolicy: Incremental Policy Algorithms for Feedback Motion Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (21)