Table of Contents
Fetching ...

Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System

Georg Schäfer, Jakob Rehrl, Stefan Huber, Simon Hirlaender

TL;DR

This paper evaluates three control strategies—PPO, MPC, and LQR/LQI—for a 1-DOF Quanser Aero 2 system to track pitch references. It provides a refined state-space representation for the RL setup and a rigorous evaluation protocol across dynamic metrics, computational cost, and real-world robustness. Experimentally, LQR achieves the best steady-state accuracy, PPO delivers the fastest rise-time but with notable overshoot, and MPC offers robust constraint-handling at higher computational expense; transfer learning helps PPO bridge simulation-to-real gaps. The findings inform controller selection by trade-off: LQR for precision, MPC for constraints, and PPO for fast, adaptable response, with future work aimed at improving PPO robustness and exploring hybrid or data-efficient RL approaches for multi-DOF extensions.

Abstract

This study conducts a comparative analysis of Model Predictive Control (MPC) and Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) algorithm, applied to a 1-Degree of Freedom (DOF) Quanser Aero 2 system. Classical control techniques such as MPC and Linear Quadratic Regulator (LQR) are widely used due to their theoretical foundation and practical effectiveness. However, with advancements in computational techniques and machine learning, DRL approaches like PPO have gained traction in solving optimal control problems through environment interaction. This paper systematically evaluates the dynamic response characteristics of PPO and MPC, comparing their performance, computational resource consumption, and implementation complexity. Experimental results show that while LQR achieves the best steady-state accuracy, PPO excels in rise-time and adaptability, making it a promising approach for applications requiring rapid response and adaptability. Additionally, we have established a baseline for future RL-related research on this specific testbed. We also discuss the strengths and limitations of each control strategy, providing recommendations for selecting appropriate controllers for real-world scenarios.

Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System

TL;DR

This paper evaluates three control strategies—PPO, MPC, and LQR/LQI—for a 1-DOF Quanser Aero 2 system to track pitch references. It provides a refined state-space representation for the RL setup and a rigorous evaluation protocol across dynamic metrics, computational cost, and real-world robustness. Experimentally, LQR achieves the best steady-state accuracy, PPO delivers the fastest rise-time but with notable overshoot, and MPC offers robust constraint-handling at higher computational expense; transfer learning helps PPO bridge simulation-to-real gaps. The findings inform controller selection by trade-off: LQR for precision, MPC for constraints, and PPO for fast, adaptable response, with future work aimed at improving PPO robustness and exploring hybrid or data-efficient RL approaches for multi-DOF extensions.

Abstract

This study conducts a comparative analysis of Model Predictive Control (MPC) and Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) algorithm, applied to a 1-Degree of Freedom (DOF) Quanser Aero 2 system. Classical control techniques such as MPC and Linear Quadratic Regulator (LQR) are widely used due to their theoretical foundation and practical effectiveness. However, with advancements in computational techniques and machine learning, DRL approaches like PPO have gained traction in solving optimal control problems through environment interaction. This paper systematically evaluates the dynamic response characteristics of PPO and MPC, comparing their performance, computational resource consumption, and implementation complexity. Experimental results show that while LQR achieves the best steady-state accuracy, PPO excels in rise-time and adaptability, making it a promising approach for applications requiring rapid response and adaptability. Additionally, we have established a baseline for future RL-related research on this specific testbed. We also discuss the strengths and limitations of each control strategy, providing recommendations for selecting appropriate controllers for real-world scenarios.
Paper Structure (15 sections, 10 equations, 5 figures, 1 table)

This paper contains 15 sections, 10 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The Quanser Aero 2 (left) and its schematic representation (right) in a 1-dof configuration.
  • Figure 2: Test sequence used for parameter identification (right column: detailed view of the section showing frequency and damping behavior).
  • Figure 3: Block diagram of the lqi approach.
  • Figure 4: Block diagram of the mpc approach.
  • Figure 5: Response of the control strategies (lqr, mpc, and ppo) over an 80-second run with changing target pitches ($r$). The top plot shows the pitch angle as the system tracks the target pitch sequence (0°, 5°, -5°, 20°, -20°, 40°, -40°, 0°). The bottom plot displays the corresponding control action applied by each strategy.