Combined Learning and Control: A New Paradigm for Optimal Control with Unknown Dynamics
Panagiotis Kounatidis, Andreas A. Malikopoulos
TL;DR
The paper addresses optimal control with unknown dynamics by marrying model-based control with data-driven penalties in a Combined Learning and Control (CLC) framework. It develops a proxy-cost DP driven by a nominal model and a per-stage penalty vector $oldsymbol{eta}$, and shows when $oldsymbol{eta}$ can be chosen a priori or must be learned online. A learning framework estimates $oldsymbol{eta}$ to ensure the CLC solution matches the true optimal policy, demonstrated on a scalar LQR with unknown dynamics and benchmarked against RL methods. The work provides theoretical boundaries, an algorithmic implementation, and empirical results illustrating CLC as a practical bridge between classical control and learning, with code available for replication.
Abstract
In this paper, we present the combined learning-and-control (CLC) approach, which is a new way to solve optimal control problems with unknown dynamics by unifying model-based control and data-driven learning. The key idea is simple: we design a controller to be optimal for a proxy objective built on an available model while penalizing mismatches with the real system, so that the resulting controller is also optimal for the actual system. Building on the original CLC formulation, we demonstrate the framework to the linear quadratic regulator problem and make three advances: (i) we show that the CLC penalty is a sequence of stage-specific weights rather than a single constant; (ii) we identify when these weights can be set in advance and when they must depend on the (unknown) dynamics; and (iii) we develop a lightweight learning loop that tunes the weights directly from data without abandoning the benefits of a model-based design. We provide a complete algorithm and an empirical study against common baseline methods. The results clarify where prior knowledge suffices and where learning is essential, and they position CLC as a practical, theoretically grounded bridge between classical optimal control and modern learning methods.
