Table of Contents
Fetching ...

Towards an Adaptable and Generalizable Optimization Engine in Decision and Control: A Meta Reinforcement Learning Approach

Sungwook Yang, Chaoying Pei, Ran Dai, Chuangchuang Sun

TL;DR

The paper tackles the challenge of updating rules for sampling-based MPC in non-stationary environments by introducing a meta-reinforcement-learning based learnable optimizer. This optimizer learns to adjust MPPI controllers without requiring expert demonstrations, enabling rapid few-shot adaptation across a distribution of control tasks. By training a meta-optimizer across tasks and using per-task gradient updates, the approach achieves faster adaptation and improved trajectory performance compared to a vanilla RL optimizer. This work advances practical MPC by delivering adaptable, generalizable optimization for updating controllers, reducing dependence on manual tuning and expert data.

Abstract

Sampling-based model predictive control (MPC) has found significant success in optimal control problems with non-smooth system dynamics and cost function. Many machine learning-based works proposed to improve MPC by a) learning or fine-tuning the dynamics/ cost function, or b) learning to optimize for the update of the MPC controllers. For the latter, imitation learning-based optimizers are trained to update the MPC controller by mimicking the expert demonstrations, which, however, are expensive or even unavailable. More significantly, many sequential decision-making problems are in non-stationary environments, requiring that an optimizer should be adaptable and generalizable to update the MPC controller for solving different tasks. To address those issues, we propose to learn an optimizer based on meta-reinforcement learning (RL) to update the controllers. This optimizer does not need expert demonstration and can enable fast adaptation (e.g., few-shots) when it is deployed in unseen control tasks. Experimental results validate the effectiveness of the learned optimizer regarding fast adaptation.

Towards an Adaptable and Generalizable Optimization Engine in Decision and Control: A Meta Reinforcement Learning Approach

TL;DR

The paper tackles the challenge of updating rules for sampling-based MPC in non-stationary environments by introducing a meta-reinforcement-learning based learnable optimizer. This optimizer learns to adjust MPPI controllers without requiring expert demonstrations, enabling rapid few-shot adaptation across a distribution of control tasks. By training a meta-optimizer across tasks and using per-task gradient updates, the approach achieves faster adaptation and improved trajectory performance compared to a vanilla RL optimizer. This work advances practical MPC by delivering adaptable, generalizable optimization for updating controllers, reducing dependence on manual tuning and expert data.

Abstract

Sampling-based model predictive control (MPC) has found significant success in optimal control problems with non-smooth system dynamics and cost function. Many machine learning-based works proposed to improve MPC by a) learning or fine-tuning the dynamics/ cost function, or b) learning to optimize for the update of the MPC controllers. For the latter, imitation learning-based optimizers are trained to update the MPC controller by mimicking the expert demonstrations, which, however, are expensive or even unavailable. More significantly, many sequential decision-making problems are in non-stationary environments, requiring that an optimizer should be adaptable and generalizable to update the MPC controller for solving different tasks. To address those issues, we propose to learn an optimizer based on meta-reinforcement learning (RL) to update the controllers. This optimizer does not need expert demonstration and can enable fast adaptation (e.g., few-shots) when it is deployed in unseen control tasks. Experimental results validate the effectiveness of the learned optimizer regarding fast adaptation.
Paper Structure (5 sections, 4 equations, 3 figures, 1 algorithm)

This paper contains 5 sections, 4 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Meta-RL Based Learnable Optimizer.
  • Figure 2: Trajectory Comparison.
  • Figure 3: Trajectory Error.