Table of Contents
Fetching ...

Datamodel-Based Data Selection for Nonlinear Data-Enabled Predictive Control

Jiachen Li, Shihao Li, Dongmei Chen

TL;DR

This work tackles data selection for nonlinear DeePC by introducing a context-dependent linear datamodel that maps initial trajectory and reference trajectory to column-inclusion scores. A neural network amortizes the context-to-influence mapping, enabling online Top-K selection that reduces the DeePC problem from using all Hankel columns to a small, task-relevant subset. Offline training uses closed-loop simulations to learn how column inclusion impacts cost across tasks, demonstrating substantial performance gains over geometry-based selection, especially at small subset sizes, while maintaining convergence toward full-data performance as data grows. The approach provides a scalable, task-aware framework for data-driven control in nonlinear systems with explicit online integration into DeePC.

Abstract

Data-Enabled Predictive Control (DeePC) has emerged as a powerful framework for controlling unknown systems directly from input-output data. For nonlinear systems, recent work has proposed selecting relevant subsets of data columns based on geometric proximity to the current operating point. However, such proximity-based selection ignores the control objective: different reference trajectories may benefit from different data even at the same operating point. In this paper, we propose a datamodel-based approach that learns a context-dependent influence function mapping the current initial trajectory and reference trajectory to column importance scores. Adapting the linear datamodel framework from machine learning, we model closed-loop cost as a linear function of column inclusion indicators, with coefficients that depend on the control context. Training on closed-loop simulations, our method captures which data columns actually improve tracking performance for specific control tasks. Experimental results demonstrate that task-aware selection substantially outperforms geometry-based heuristics, particularly when using small data subsets.

Datamodel-Based Data Selection for Nonlinear Data-Enabled Predictive Control

TL;DR

This work tackles data selection for nonlinear DeePC by introducing a context-dependent linear datamodel that maps initial trajectory and reference trajectory to column-inclusion scores. A neural network amortizes the context-to-influence mapping, enabling online Top-K selection that reduces the DeePC problem from using all Hankel columns to a small, task-relevant subset. Offline training uses closed-loop simulations to learn how column inclusion impacts cost across tasks, demonstrating substantial performance gains over geometry-based selection, especially at small subset sizes, while maintaining convergence toward full-data performance as data grows. The approach provides a scalable, task-aware framework for data-driven control in nonlinear systems with explicit online integration into DeePC.

Abstract

Data-Enabled Predictive Control (DeePC) has emerged as a powerful framework for controlling unknown systems directly from input-output data. For nonlinear systems, recent work has proposed selecting relevant subsets of data columns based on geometric proximity to the current operating point. However, such proximity-based selection ignores the control objective: different reference trajectories may benefit from different data even at the same operating point. In this paper, we propose a datamodel-based approach that learns a context-dependent influence function mapping the current initial trajectory and reference trajectory to column importance scores. Adapting the linear datamodel framework from machine learning, we model closed-loop cost as a linear function of column inclusion indicators, with coefficients that depend on the control context. Training on closed-loop simulations, our method captures which data columns actually improve tracking performance for specific control tasks. Experimental results demonstrate that task-aware selection substantially outperforms geometry-based heuristics, particularly when using small data subsets.

Paper Structure

This paper contains 30 sections, 20 equations, 1 figure, 2 algorithms.

Figures (1)

  • Figure 1: Closed-loop cost versus number of selected columns on the Reacher environment ($M = 36{,}800$ total columns). Line styles indicate selection methods (solid: Datamodel, dash-dot: $L_1$, dotted: Isomap, dashed: Random). Colors indicate cost metrics (blue: Cost, orange: IAE, green: ISE). Lower is better. Our datamodel-based approach achieves approximately $2\times$ lower cost than geometric baselines at small subset sizes.