Necessary and Sufficient Conditions for the Optimization-Based Concurrent Execution of Learned Robotic Tasks
Sheikh A. Tahmid, Gennaro Notomista
TL;DR
This work studies when multiple RL-learned tasks encoded as value functions can be executed concurrently by an optimization-based controller for control-affine systems. It introduces formal definitions of concurrent executability and controllability and proves necessary and sufficient conditions for concurrent execution within state-space subsets using a pointwise min-norm controller, linking feasibility to common equilibrium sets and Lie-derivative conditions. The authors extend the framework to handle discounted value functions by modifying the constraint margins to $\bar{\sigma}_i(x)$, aligning with the Hamilton-Jacobi-Bellman equation, and demonstrate the approach on planar-robot simulations. Together, these results provide a rigorous basis for when learned control tasks can be combined, when they inherently co-occur, or when concurrency is impossible, with potential broader impact on multi-objective RL analysis and RL-control integration.
Abstract
In this work, we consider the problem of executing multiple tasks encoded by value functions, each learned through Reinforcement Learning, using an optimization-based framework. Prior works develop such a framework, but left unanswered a fundamental question of when learned value functions can be concurrently executed. The main contribution of this work is to present theorems which provide necessary and sufficient conditions to concurrently execute sets of learned tasks within subsets of the state space, using a previously proposed min-norm controller. These theorems provide insight into when learned control tasks are possible to be made concurrently executable, when they might already inherently be concurrently executable and when it is not possible at all to make a set of learned tasks concurrently executable using the previously proposed methods. Additional contributions of this work include extending the optimization-based framework to execute multiple tasks encoded by value functions to also account for value functions trained with a discount factor, making the overall framework more compatible with standard RL practices.
