Table of Contents
Fetching ...

Mission-Aligned Learning-Informed Control of Autonomous Systems: Formulation and Foundations

Vyacheslav Kungurtsev, Monicah Cherop Naibei, Gustav Sir, Akhil Anand, Sebastien Gros, Haozhe Tian, Homayoun Hamedmoghadam

Abstract

Research, innovation and practical capital investment have been increasing rapidly toward the realization of autonomous physical agents. This includes industrial and service robots, unmanned aerial vehicles, embedded control devices, and a number of other realizations of cybernetic/mechatronic implementations of intelligent autonomous devices. In this paper, we consider a stylized version of robotic care, which would normally involve a two-level Reinforcement Learning procedure that trains a policy for both lower level physical movement decisions as well as higher level conceptual tasks and their sub-components. In order to deliver greater safety and reliability in the system, we present the general formulation of this as a two-level optimization scheme which incorporates control at the lower level, and classical planning at the higher level, integrated with a capacity for learning. This synergistic integration of multiple methodologies -- control, classical planning, and RL -- presents an opportunity for greater insight for algorithm development, leading to more efficient and reliable performance. Here, the notion of reliability pertains to physical safety and interpretability into an otherwise black box operation of autonomous agents, concerning users and regulators. This work presents the necessary background and general formulation of the optimization framework, detailing each component and its integration with the others.

Mission-Aligned Learning-Informed Control of Autonomous Systems: Formulation and Foundations

Abstract

Research, innovation and practical capital investment have been increasing rapidly toward the realization of autonomous physical agents. This includes industrial and service robots, unmanned aerial vehicles, embedded control devices, and a number of other realizations of cybernetic/mechatronic implementations of intelligent autonomous devices. In this paper, we consider a stylized version of robotic care, which would normally involve a two-level Reinforcement Learning procedure that trains a policy for both lower level physical movement decisions as well as higher level conceptual tasks and their sub-components. In order to deliver greater safety and reliability in the system, we present the general formulation of this as a two-level optimization scheme which incorporates control at the lower level, and classical planning at the higher level, integrated with a capacity for learning. This synergistic integration of multiple methodologies -- control, classical planning, and RL -- presents an opportunity for greater insight for algorithm development, leading to more efficient and reliable performance. Here, the notion of reliability pertains to physical safety and interpretability into an otherwise black box operation of autonomous agents, concerning users and regulators. This work presents the necessary background and general formulation of the optimization framework, detailing each component and its integration with the others.

Paper Structure

This paper contains 77 sections, 93 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: High-level methodology overview. A Scheduler (left) drives a Planner and MPC, ultimately controlling the physical Agent (right), with feedback loops returning state and reward information for respective updates.
  • Figure 2: Hierarchical Bilevel RL Architecture: Visualizing the decoupling of timescales. The upper level policy assigns discrete abstract actions (e.g., feeding a human) to the lower level. The lower level evaluates continuous state feedback at a high frequency to generate physical control outputs. The resulting environmental trajectory determines the reward signals evaluated by the cost functions $F$ and $G$.
  • Figure 3: High-level Integration Schema: A learning module (bottom) collects data from the robot to update parameters of the decision-making modules (top).
  • Figure 4: A temporal model of integrating the Planning and Control Layers. The slow-timescale Scheduler assigns tasks (e.g., Task A) which spawn Planning Problem(s). The discrete Planner issues actions $\hat{a}_t$ that trigger the fast-timescale MPC. Continuous physical states $\hat{z}$ map back to discrete symbolic states $\bar{s}$ via (de-)fuzzification ($\mu$). Possible discrepancies prompt real-time re-planning.
  • Figure 5: Action-Triggered RLMPC Loop: The receding horizon principle. At each fast-timescale interval ($t, t+h, t+2h, \dots$), an optimal control problem is solved predicting the state trajectory over horizon $H$ (dashed lines). Only the first control action $\hat{v}(\tau)$ is applied, generating the true closed-loop trajectory (solid blue line). The RL Critic observes this realized performance and updates the cost parameters $\theta$.
  • ...and 14 more figures