Datamodel-Based Data Selection for Nonlinear Data-Enabled Predictive Control
Jiachen Li, Shihao Li, Dongmei Chen
TL;DR
This work tackles data selection for nonlinear DeePC by introducing a context-dependent linear datamodel that maps initial trajectory and reference trajectory to column-inclusion scores. A neural network amortizes the context-to-influence mapping, enabling online Top-K selection that reduces the DeePC problem from using all Hankel columns to a small, task-relevant subset. Offline training uses closed-loop simulations to learn how column inclusion impacts cost across tasks, demonstrating substantial performance gains over geometry-based selection, especially at small subset sizes, while maintaining convergence toward full-data performance as data grows. The approach provides a scalable, task-aware framework for data-driven control in nonlinear systems with explicit online integration into DeePC.
Abstract
Data-Enabled Predictive Control (DeePC) has emerged as a powerful framework for controlling unknown systems directly from input-output data. For nonlinear systems, recent work has proposed selecting relevant subsets of data columns based on geometric proximity to the current operating point. However, such proximity-based selection ignores the control objective: different reference trajectories may benefit from different data even at the same operating point. In this paper, we propose a datamodel-based approach that learns a context-dependent influence function mapping the current initial trajectory and reference trajectory to column importance scores. Adapting the linear datamodel framework from machine learning, we model closed-loop cost as a linear function of column inclusion indicators, with coefficients that depend on the control context. Training on closed-loop simulations, our method captures which data columns actually improve tracking performance for specific control tasks. Experimental results demonstrate that task-aware selection substantially outperforms geometry-based heuristics, particularly when using small data subsets.
