Table of Contents
Fetching ...

Direct transfer of optimized controllers to similar systems using dimensionless MPC

Josip Kir Hromatko, Shambhuraj Sawant, Šandor Ileš, Sébastien Gros

TL;DR

Dynamic similarity across scaled systems is often challenging for direct controller transfer. The paper presents a dimensionless MDP/MPC framework that nondimensionalizes dynamics, costs, and constraints, enabling zero-shot transfer of tuned controllers between dynamically similar systems. Policy tuning is performed via reinforcement learning or Bayesian optimization in the dimensionless domain, demonstrated on cart-pole and race-car tasks with successful cross-scale transfer. This approach allows learning from multi-scale data and can substantially reduce costly full-scale experiments while maintaining closed-loop performance.

Abstract

Scaled model experiments are commonly used in various engineering fields to reduce experimentation costs and overcome constraints associated with full-scale systems. The relevance of such experiments relies on dimensional analysis and the principle of dynamic similarity. However, transferring controllers to full-scale systems often requires additional tuning. In this paper, we propose a method to enable a direct controller transfer using dimensionless model predictive control, tuned automatically for closed-loop performance. With this reformulation, the closed-loop behavior of an optimized controller transfers directly to a new, dynamically similar system. Additionally, the dimensionless formulation allows for the use of data from systems of different scales during parameter optimization. We demonstrate the method on a cartpole swing-up and a car racing problem, applying either reinforcement learning or Bayesian optimization for tuning the controller parameters. Software used to obtain the results in this paper is publicly available at https://github.com/josipkh/dimensionless-mpcrl.

Direct transfer of optimized controllers to similar systems using dimensionless MPC

TL;DR

Dynamic similarity across scaled systems is often challenging for direct controller transfer. The paper presents a dimensionless MDP/MPC framework that nondimensionalizes dynamics, costs, and constraints, enabling zero-shot transfer of tuned controllers between dynamically similar systems. Policy tuning is performed via reinforcement learning or Bayesian optimization in the dimensionless domain, demonstrated on cart-pole and race-car tasks with successful cross-scale transfer. This approach allows learning from multi-scale data and can substantially reduce costly full-scale experiments while maintaining closed-loop performance.

Abstract

Scaled model experiments are commonly used in various engineering fields to reduce experimentation costs and overcome constraints associated with full-scale systems. The relevance of such experiments relies on dimensional analysis and the principle of dynamic similarity. However, transferring controllers to full-scale systems often requires additional tuning. In this paper, we propose a method to enable a direct controller transfer using dimensionless model predictive control, tuned automatically for closed-loop performance. With this reformulation, the closed-loop behavior of an optimized controller transfers directly to a new, dynamically similar system. Additionally, the dimensionless formulation allows for the use of data from systems of different scales during parameter optimization. We demonstrate the method on a cartpole swing-up and a car racing problem, applying either reinforcement learning or Bayesian optimization for tuning the controller parameters. Software used to obtain the results in this paper is publicly available at https://github.com/josipkh/dimensionless-mpcrl.

Paper Structure

This paper contains 26 sections, 32 equations, 4 figures.

Figures (4)

  • Figure 1: The cartpole system from anand2023.
  • Figure 2: The progress of validation score, showing the mean and standard deviation of 5 independent runs. Note that the score corresponds to the negative MDP cost (i.e., higher is better). A value above ca. 18 indicates a successful swing-up and balancing.
  • Figure 3: Results of 50 trials using the small-scale vehicle, with a time limit of 10 seconds. The red line indicates the best lap time obtained up to the specific trial.
  • Figure 4: Full-size vehicle trajectory using the parameters optimized on a small-scale vehicle.

Theorems & Definitions (1)

  • Definition 1: Similar MDPs