Table of Contents
Fetching ...

Necessary and Sufficient Conditions for the Optimization-Based Concurrent Execution of Learned Robotic Tasks

Sheikh A. Tahmid, Gennaro Notomista

TL;DR

This work studies when multiple RL-learned tasks encoded as value functions can be executed concurrently by an optimization-based controller for control-affine systems. It introduces formal definitions of concurrent executability and controllability and proves necessary and sufficient conditions for concurrent execution within state-space subsets using a pointwise min-norm controller, linking feasibility to common equilibrium sets and Lie-derivative conditions. The authors extend the framework to handle discounted value functions by modifying the constraint margins to $\bar{\sigma}_i(x)$, aligning with the Hamilton-Jacobi-Bellman equation, and demonstrate the approach on planar-robot simulations. Together, these results provide a rigorous basis for when learned control tasks can be combined, when they inherently co-occur, or when concurrency is impossible, with potential broader impact on multi-objective RL analysis and RL-control integration.

Abstract

In this work, we consider the problem of executing multiple tasks encoded by value functions, each learned through Reinforcement Learning, using an optimization-based framework. Prior works develop such a framework, but left unanswered a fundamental question of when learned value functions can be concurrently executed. The main contribution of this work is to present theorems which provide necessary and sufficient conditions to concurrently execute sets of learned tasks within subsets of the state space, using a previously proposed min-norm controller. These theorems provide insight into when learned control tasks are possible to be made concurrently executable, when they might already inherently be concurrently executable and when it is not possible at all to make a set of learned tasks concurrently executable using the previously proposed methods. Additional contributions of this work include extending the optimization-based framework to execute multiple tasks encoded by value functions to also account for value functions trained with a discount factor, making the overall framework more compatible with standard RL practices.

Necessary and Sufficient Conditions for the Optimization-Based Concurrent Execution of Learned Robotic Tasks

TL;DR

This work studies when multiple RL-learned tasks encoded as value functions can be executed concurrently by an optimization-based controller for control-affine systems. It introduces formal definitions of concurrent executability and controllability and proves necessary and sufficient conditions for concurrent execution within state-space subsets using a pointwise min-norm controller, linking feasibility to common equilibrium sets and Lie-derivative conditions. The authors extend the framework to handle discounted value functions by modifying the constraint margins to , aligning with the Hamilton-Jacobi-Bellman equation, and demonstrate the approach on planar-robot simulations. Together, these results provide a rigorous basis for when learned control tasks can be combined, when they inherently co-occur, or when concurrency is impossible, with potential broader impact on multi-objective RL analysis and RL-control integration.

Abstract

In this work, we consider the problem of executing multiple tasks encoded by value functions, each learned through Reinforcement Learning, using an optimization-based framework. Prior works develop such a framework, but left unanswered a fundamental question of when learned value functions can be concurrently executed. The main contribution of this work is to present theorems which provide necessary and sufficient conditions to concurrently execute sets of learned tasks within subsets of the state space, using a previously proposed min-norm controller. These theorems provide insight into when learned control tasks are possible to be made concurrently executable, when they might already inherently be concurrently executable and when it is not possible at all to make a set of learned tasks concurrently executable using the previously proposed methods. Additional contributions of this work include extending the optimization-based framework to execute multiple tasks encoded by value functions to also account for value functions trained with a discount factor, making the overall framework more compatible with standard RL practices.

Paper Structure

This paper contains 13 sections, 4 theorems, 12 equations, 2 figures.

Key Result

Lemma 1

Let $\tilde{J}_1, \ldots, \tilde{J}_N$ be a set of learned value functions as described in (2). Let $A \subset \mathcal{X}$ be a compact set. Assume that $\forall x \in A$, $\exists u \in \mathbb{R}^m$ such that $\dot{x} = f(x) + g(x) u \in T_A(x)$, the tangent cone to $A$ at $x$, and $L_f \tilde{J}

Figures (2)

  • Figure 1: Heatmaps of $\tilde{J}_1$, $\tilde{J}_2$ and $\tilde{J}_3$, respectively.
  • Figure 2: Heatmap of angles between the gradients of $\tilde{J}_1$ and $\tilde{J}_2$ on the left corresponding to Example 1. Heatmap of angles between the gradients of $\tilde{J}_2$ and $\tilde{J}_3$ on the right correspoding to Example 2. Points that are closer to red are points where the two value functions in the scenario are not concurrently executable. Both images contain trajectories, in white, of the min-norm controller executing both tasks in each scenario from different initial states, with the gray circle being the final position of each trajectory. In Example 1, the trajectory gets stuck as it ends up in a state that is not concurrently executable. In Example 2, one trajectory gets stuck while the other that starts close enough to the concurrently executable region reaches one of the four goal points.

Theorems & Definitions (16)

  • Definition 1
  • Definition 2
  • Remark 1
  • Definition 3
  • Definition 4
  • Remark 2
  • Lemma 1
  • proof
  • Proposition 1
  • proof
  • ...and 6 more