Accelerating scientific discovery with the common task framework
J. Nathan Kutz, Peter Battaglia, Michael Brenner, Kevin Carlberg, Aric Hagberg, Shirley Ho, Stephan Hoyer, Henning Lange, Hod Lipson, Michael W. Mahoney, Frank Noe, Max Welling, Laure Zanna, Francis Zhu, Steven L. Brunton
TL;DR
The paper proposes the Common Task Framework (CTF) to provide fair, objective benchmarking of ML/AI methods on dynamic systems with withheld test sets and multi-metric evaluation. It combines a permanent collection of toy dynamical models with rotating real-world datasets, structured around 12 challenge matrices $X_J$ and corresponding scores (e.g., $E_1$–$E_{12}$) to evaluate forecasting, reconstruction, limited-data performance, and parametric generalization. It situates CTF within a broader discussion of inductive versus deductive reasoning, advocating for physics-informed constraints and transparent evaluation workflows, including a Sage Bionetworks referee and GitHub-based reproducibility. The framework is designed to be laptop-accessible, scalable, and adaptable to diverse scientific domains, aiming to accelerate responsible, cross-disciplinary progress in data-driven science and engineering.
Abstract
Machine learning (ML) and artificial intelligence (AI) algorithms are transforming and empowering the characterization and control of dynamic systems in the engineering, physical, and biological sciences. These emerging modeling paradigms require comparative metrics to evaluate a diverse set of scientific objectives, including forecasting, state reconstruction, generalization, and control, while also considering limited data scenarios and noisy measurements. We introduce a common task framework (CTF) for science and engineering, which features a growing collection of challenge data sets with a diverse set of practical and common objectives. The CTF is a critically enabling technology that has contributed to the rapid advance of ML/AI algorithms in traditional applications such as speech recognition, language processing, and computer vision. There is a critical need for the objective metrics of a CTF to compare the diverse algorithms being rapidly developed and deployed in practice today across science and engineering.
