Information theory and discriminative sampling for model discovery
Yuxuan Bao, J. Nathan Kutz
TL;DR
This work integrates Fisher information and entropy-based metrics into data-driven model discovery with SINDy to quantify how informatively different data segments contribute to learning. It derives a rigorous FIM-based framework, analyzes spectral properties, and demonstrates that discriminative sampling and bagging can substantially improve identification accuracy with less data. The authors show that information-guided data collection, active control, and entropy-search strategies yield faster convergence across single-trajectory and multi-trajectory scenarios, including chaotic systems. Practical implications include enhanced sampling design, improved robustness under noise, and a roadmap for integrating information theory with adaptive sensing in dynamical-system identification.
Abstract
Fisher information and Shannon entropy are fundamental tools for understanding and analyzing dynamical systems from complementary perspectives. They can characterize unknown parameters by quantifying the information contained in variables, or measure how different initial trajectories or temporal segments of a trajectory contribute to learning or inferring system dynamics. In this work, we leverage the Fisher Information Matrix (FIM) within the data-driven framework of {\em sparse identification of nonlinear dynamics} (SINDy). We visualize information patterns in chaotic and non-chaotic systems for both single trajectories and multiple initial conditions, demonstrating how information-based analysis can improve sampling efficiency and enhance model performance by prioritizing more informative data. The benefits of statistical bagging are further elucidated through spectral analysis of the FIM. We also illustrate how Fisher information and entropy metrics can promote data efficiency in three scenarios: when only a single trajectory is available, when a tunable control parameter exists, and when multiple trajectories can be freely initialized. As data-driven model discovery continues to gain prominence, principled sampling strategies guided by quantifiable information metrics offer a powerful approach for improving learning efficiency and reducing data requirements.
