A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning

Jesse Jiang; Samuel Coogan; Ye Zhao

A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning

Jesse Jiang, Samuel Coogan, Ye Zhao

TL;DR

This work addresses navigation for hopping robots under dynamic uncertainty by fusing formal task guarantees with reward-driven exploration. It introduces the Multi-task Product IMDP (MT-PIMDP) to couple 3D SLIP-like hopping dynamics with LTL specifications and exploration rewards, supported by a neural-network-based low-level controller and a learning framework for unknown dynamics. The authors prove a trade-off between LTL task efficiency and exploration reward and validate the approach through case studies, showing tunable prioritization via switching parameters. The framework offers a principled path to robust, probabilistic planning for legged robots in uncertain environments and can extend to other kinodynamic systems.

Abstract

This study examines the problem of hopping robot navigation planning to achieve simultaneous goal-directed and environment exploration tasks. We consider a scenario in which the robot has mandatory goal-directed tasks defined using Linear Temporal Logic (LTL) specifications as well as optional exploration tasks represented using a reward function. Additionally, there exists uncertainty in the robot dynamics which results in motion perturbation. We first propose an abstraction of 3D hopping robot dynamics which enables high-level planning and a neural-network-based optimization for low-level control. We then introduce a Multi-task Product IMDP (MT-PIMDP) model of the system and tasks. We propose a unified control policy synthesis algorithm which enables both task-directed goal-reaching behaviors as well as task-agnostic exploration to learn perturbations and reward. We provide a formal proof of the trade-off induced by prioritizing either LTL or RL actions. We demonstrate our methods with simulation case studies in a 2D world navigation environment.

A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning

TL;DR

Abstract

Paper Structure (23 sections, 1 theorem, 20 equations, 4 figures, 1 table)

This paper contains 23 sections, 1 theorem, 20 equations, 4 figures, 1 table.

Introduction
Related Work
Contributions
Preliminaries
SLIP Hopping Model
Temporal Logic Planning
Problem Statement
IMDP Abstraction of Hopping Robot
State Abstraction
Learning-based Controller
Synthesis Approach
Multi-task PIMDP
Control Policy Synthesis
Nonviolating Environment-Exploration Control Policy
Goal-Reaching Control Policy
...and 8 more sections

Key Result

Theorem 1

Given the $\epsilon$-greedy switching policy eq: Epsilon Switching when applying the goal-reaching policy in Section subsection: Goal Reaching, one has a trade-off between the speed of satisfying the LTL tasks and the maximization of the achieved reward.

Figures (4)

Figure 1: Illustration of the dynamic motion for hopping robots considered in this work. At the initial contact state $\xi_{c,k}$, the trajectory of the next hop can be determined via forward simulation of the dynamics until the interstitial state $\xi_{i,k}$. The objective of the learned controller is to achieve the desired next interstitial state $\xi_{i,k+1}$ by controlling the leg angles $\alpha_k,\beta_k$ at the next contact state $\xi_{c,k+1}$.
Figure 2: Overall block diagram of the framework. The NN-based Optimization block and its connections (in blue) are expanded in Figure \ref{['fig:Hopping Robot Switching']}.
Figure 3: Diagram of the complete controller framework. A switch at the high level chooses between LTL goal-reaching and RL reward-maximizing actions, and a switch at the low level minimizes locomotion deviation by using a backup action if necessary.
Figure 4: State space and results of the case studies. The initial region is in blue, the target region is green, and the hazard regions are red. Additionally, we have optional tasks with optional goals labeled in purple ("A" states with reward 20, and "B" states with reward 5), and weak hazards with penalty -0.5 labeled in pink. (a): In this case, the reward function is known. (b): A known reward environment similar to (a), but with the high-value "A" states now concentrated in the bottom right region of the environment. (c): The robot traverses the same environment as in (a), but the reward function is unknown. The depicted trajectory shows the tenth run of the robot after it has learned the reward function over nine previous runs. (d): The unknown reward version of (b), again depicting the tenth run through the environment.

Theorems & Definitions (13)

Definition 1: Interstitial State
Definition 2: Deterministic Rabin Automaton
Definition 3: Interval Markov Decision Process
Definition 4: PIMDP
Definition 5: IMDP Abstraction of Hopping Dynamics
Definition 6: Multi-task PIMDP
Definition 7: Control Policy
Definition 8: MT-PIMDP Adversary
Definition 9: Nonviolating MT-PIMDP
Definition 10: End Component baier_principles_2008
...and 3 more

A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning

TL;DR

Abstract

A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (13)