Hybrid additive modeling with partial dependence for supervised regression and dynamical systems forecasting
Yann Claes, Vân Anh Huynh-Thu, Pierre Geurts
TL;DR
The paper addresses the challenge of blending prior physics-informed terms with data-driven models in supervised regression and dynamical forecasting. It introduces a Partial Dependence (PD) based training approach that reduces or eliminates heavy regularization, and it assesses PD alongside sequential and alternating training across synthetic and real regression tasks as well as Neural ODE-based dynamical forecasting. Results show that PD-based training often yields robust estimates of the parametric prior, improved generalization over purely data-driven models, and competitive or superior performance in dynamical systems, though its advantages can diminish when there is substantial overlap between the known and learned components. The work demonstrates the versatility of PD-based optimization as a model-agnostic method, highlights conditions under which it excels (e.g., disjoint feature sets), and points to future theoretical and methodological refinements for hybrid physics-ML models.
Abstract
Learning processes by exploiting restricted domain knowledge is an important task across a plethora of scientific areas, with more and more hybrid training methods additively combining data-driven and model-based approaches. Although the obtained models are more accurate than purely data-driven models, the optimization process usually comes with sensitive regularization constraints. Furthermore, while such hybrid methods have been tested in various scientific applications, they have been mostly tested on dynamical systems, with only limited study about the influence of each model component on global performance and parameter identification. In this work, we introduce a new hybrid training approach based on partial dependence, which removes the need for intricate regularization. Moreover, we assess the performance of hybrid modeling against traditional machine learning methods on standard regression problems. We compare, on both synthetic and real regression problems, several approaches for training such hybrid models. We focus on hybrid methods that additively combine a parametric term with a machine learning term and investigate model-agnostic training procedures. Therefore, experiments are carried out with different types of machine learning models, including tree-based models and artificial neural networks. We also extend our partial dependence optimization process for dynamical systems forecasting and compare it to existing schemes.
