Offline Supervised Learning V.S. Online Direct Policy Optimization: A Comparative Study and A Unified Training Paradigm for Neural Network-Based Optimal Feedback Control
Yue Zhao, Jiequn Han
TL;DR
This work benchmarks two neural-network-based strategies for optimal feedback control: offline supervised learning (SL), which learns from precomputed open-loop solutions, and online direct policy optimization (DO), which optimizes a closed-loop policy with respect to the original OCP. It shows that SL generally achieves near-optimal performance with far lower training time, while DO benefits from good initialization but is sensitive to optimization landscapes, especially over longer horizons. To leverage the strengths of both, the authors propose a unified Pre-train and Fine-tune paradigm: pre-train with SL on an open-loop dataset and then fine-tune online with DO, yielding improved performance and robustness across challenging tasks. Experiments on satellite attitude control and quadrotor landing confirm that SL outperforms DO in many settings, and fine-tuning consistently enhances robustness, reducing the data and time required to approach optimal control. The work provides practical guidance for training neural network-based optimal feedback controllers and releases code to facilitate replication and further research.
Abstract
This work is concerned with solving neural network-based feedback controllers efficiently for optimal control problems. We first conduct a comparative study of two prevalent approaches: offline supervised learning and online direct policy optimization. Albeit the training part of the supervised learning approach is relatively easy, the success of the method heavily depends on the optimal control dataset generated by open-loop optimal control solvers. In contrast, direct policy optimization turns the optimal control problem into an optimization problem directly without any requirement of pre-computing, but the dynamics-related objective can be hard to optimize when the problem is complicated. Our results underscore the superiority of offline supervised learning in terms of both optimality and training time. To overcome the main challenges, dataset and optimization, in the two approaches respectively, we complement them and propose the Pre-train and Fine-tune strategy as a unified training paradigm for optimal feedback control, which further improves the performance and robustness significantly. Our code is accessible at https://github.com/yzhao98/DeepOptimalControl.
