Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System
Georg Schäfer, Jakob Rehrl, Stefan Huber, Simon Hirlaender
TL;DR
This paper evaluates three control strategies—PPO, MPC, and LQR/LQI—for a 1-DOF Quanser Aero 2 system to track pitch references. It provides a refined state-space representation for the RL setup and a rigorous evaluation protocol across dynamic metrics, computational cost, and real-world robustness. Experimentally, LQR achieves the best steady-state accuracy, PPO delivers the fastest rise-time but with notable overshoot, and MPC offers robust constraint-handling at higher computational expense; transfer learning helps PPO bridge simulation-to-real gaps. The findings inform controller selection by trade-off: LQR for precision, MPC for constraints, and PPO for fast, adaptable response, with future work aimed at improving PPO robustness and exploring hybrid or data-efficient RL approaches for multi-DOF extensions.
Abstract
This study conducts a comparative analysis of Model Predictive Control (MPC) and Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) algorithm, applied to a 1-Degree of Freedom (DOF) Quanser Aero 2 system. Classical control techniques such as MPC and Linear Quadratic Regulator (LQR) are widely used due to their theoretical foundation and practical effectiveness. However, with advancements in computational techniques and machine learning, DRL approaches like PPO have gained traction in solving optimal control problems through environment interaction. This paper systematically evaluates the dynamic response characteristics of PPO and MPC, comparing their performance, computational resource consumption, and implementation complexity. Experimental results show that while LQR achieves the best steady-state accuracy, PPO excels in rise-time and adaptability, making it a promising approach for applications requiring rapid response and adaptability. Additionally, we have established a baseline for future RL-related research on this specific testbed. We also discuss the strengths and limitations of each control strategy, providing recommendations for selecting appropriate controllers for real-world scenarios.
