From Features to States: Data-Driven Selection of Measured State Variables via RFE-DMDc
Haoyu Wang, Andrea Alfonsi, Roberto Ponciroli, Richard Vilim
TL;DR
This work tackles data-driven state-variable selection for dynamical systems by seeking a minimal, physically interpretable set of measured variables that can serve as the surrogate state for linear control-oriented models. It introduces RFE-DMDc, which combines Recursive Feature Elimination with Dynamic Mode Decomposition with Control and includes a cross-subsystem balancing step to prevent dominance by high-gain subsystems. Through a truth-known RLC benchmark and a large Integrated Energy System with thousands of candidate variables, RFE-DMDc identifies compact state sets (≈10 variables) achieving test accuracy comparable to a GA-based baseline (GA-DMDc) but with an order of magnitude lower computational cost, while preserving physical interpretability. The method enhances sensor planning and Digital Twin deployment by prioritizing measured variables that maximize dynamical predictability with minimal sensing overhead.
Abstract
The behavior of a dynamical system under a given set of inputs can be captured by tracking the response of an optimal subset of process variables (\textit{state variables}). For many engineering systems, however, first-principles, model-based identification is impractical, motivating data-driven approaches for Digital Twins used in control and diagnostics. In this paper, we present RFE-DMDc, a supervised, data-driven workflow that uses Recursive Feature Elimination (RFE) to select a minimal, physically meaningful set of variables to monitor and then derives a linear state-space model via Dynamic Mode Decomposition with Control (DMDc). The workflow includes a cross-subsystem selection step that mitigates feature \textit{overshadowing} in multi-component systems. To corroborate the results, we implement a GA-DMDc baseline that jointly optimizes the state set and model fit under a common accuracy cost on states and outputs. Across a truth-known RLC benchmark and a realistic Integrated Energy System (IES) with multiple thermally coupled components and thousands of candidate variables, RFE-DMDc consistently recovers compact state sets (\(\approx 10\) variables) that achieve test errors comparable to GA-DMDc while requiring an order of magnitude less computational time. The selected variables retain clear physical interpretation across subsystems, and the resulting models demonstrate competitive predictive accuracy, computational efficiency, and robustness to overfitting.
