Table of Contents
Fetching ...

Value Iteration for Learning Concurrently Executable Robotic Control Tasks

Sheikh A. Tahmid, Gennaro Notomista

TL;DR

This work tackles the problem of concurrent task execution in redundant robotic systems by introducing task independence for learned cost-to-go functions. It develops a cost-functional formulation and a continuous fitted value iteration (CFVI) approach to train tasks so their gradients are independent or orthogonal, enabling simultaneous execution via a min-norm controller. Theoretical results (including Propositions 2–3) connect independence and orthogonality to feasible multi-task control and provide an analytic optimal input form, $u^* = -\tfrac{1}{2} R(x)^{-1} (L_g J(x))^T$, for the learned costs. Empirical results across multiple mobile-robot scenarios, with both simulated and physical experiments, demonstrate improved concurrency and the ability to adapt task stacks online, offering a practical pathway to robust multi-task robotic control.

Abstract

Many modern robotic systems such as multi-robot systems and manipulators exhibit redundancy, a property owing to which they are capable of executing multiple tasks. This work proposes a novel method, based on the Reinforcement Learning (RL) paradigm, to train redundant robots to be able to execute multiple tasks concurrently. Our approach differs from typical multi-objective RL methods insofar as the learned tasks can be combined and executed in possibly time-varying prioritized stacks. We do so by first defining a notion of task independence between learned value functions. We then use our definition of task independence to propose a cost functional that encourages a policy, based on an approximated value function, to accomplish its control objective while minimally interfering with the execution of higher priority tasks. This allows us to train a set of control policies that can be executed simultaneously. We also introduce a version of fitted value iteration to learn to approximate our proposed cost functional efficiently. We demonstrate our approach on several scenarios and robotic systems.

Value Iteration for Learning Concurrently Executable Robotic Control Tasks

TL;DR

This work tackles the problem of concurrent task execution in redundant robotic systems by introducing task independence for learned cost-to-go functions. It develops a cost-functional formulation and a continuous fitted value iteration (CFVI) approach to train tasks so their gradients are independent or orthogonal, enabling simultaneous execution via a min-norm controller. Theoretical results (including Propositions 2–3) connect independence and orthogonality to feasible multi-task control and provide an analytic optimal input form, , for the learned costs. Empirical results across multiple mobile-robot scenarios, with both simulated and physical experiments, demonstrate improved concurrency and the ability to adapt task stacks online, offering a practical pathway to robust multi-task robotic control.

Abstract

Many modern robotic systems such as multi-robot systems and manipulators exhibit redundancy, a property owing to which they are capable of executing multiple tasks. This work proposes a novel method, based on the Reinforcement Learning (RL) paradigm, to train redundant robots to be able to execute multiple tasks concurrently. Our approach differs from typical multi-objective RL methods insofar as the learned tasks can be combined and executed in possibly time-varying prioritized stacks. We do so by first defining a notion of task independence between learned value functions. We then use our definition of task independence to propose a cost functional that encourages a policy, based on an approximated value function, to accomplish its control objective while minimally interfering with the execution of higher priority tasks. This allows us to train a set of control policies that can be executed simultaneously. We also introduce a version of fitted value iteration to learn to approximate our proposed cost functional efficiently. We demonstrate our approach on several scenarios and robotic systems.

Paper Structure

This paper contains 18 sections, 3 theorems, 9 equations, 4 figures, 2 tables.

Key Result

Proposition 1

Given a control-affine system and the $k$-th iteration of an estimate of the cost-to-go function described in (4), $J^k$, the optimal policy based on the fitted value iteration update step in (1) in the continuous setting is:

Figures (4)

  • Figure 1: (a): Heatmap of function, $\tilde{J}_1$, trained for avoidance task. (b): Heatmap of function, $\tilde{J}_{\textrm{base}}$, trained using CFVI. (c): Heatmap of function, $\tilde{J}_{\textrm{ind}}$, trained using proposed method to be independent to avoidance task. Note that gradients of $\tilde{J}_{\textrm{ind}}$ appear to be linearly independent to those of $\tilde{J}_1$ as values in (c) appear to warp around the square region.
  • Figure 2: (a): Trajectory when combining avoidance task, $\tilde{J}_1$, with baseline go-to-point task, $\tilde{J}_{\textrm{base}}$. (b): Trajectory when combining $\tilde{J}_1$ with go-to-point task trained using proposed approach, $\tilde{J}_{\textrm{ind}}$. Green: Positions at each time step. Red: Final position for each trajectory. Blue: Region to avoid.
  • Figure 3: Robot team forming triangle while avoiding region. Coloured Dots: Robots 1-3. Lighter Blue: Region to avoid. Grey: Trajectories of each robot.
  • Figure 4: Robot team successfully forming triangle, sending first robot to origin and avoiding regions. Coloured Dots: Robots 1-3. Lighter Blue: Regions to avoid. Grey: Trajectories of each robot.

Theorems & Definitions (7)

  • Remark 1
  • Definition 1: Independent and Orthogonal Tasks
  • Remark 2
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Remark 3