Explainable data-driven modeling via mixture of experts: towards effective blending of grey and black-box models
Jessica Leoni, Valentina Breschi, Simone Formentin, Mara Tanelli
TL;DR
This work tackles the challenge of blending physics-informed (grey-box) priors with data-driven (black-box) models for complex systems by proposing a general mixture of experts (MoE) framework. It introduces a novel fitting objective $J(X,Y;\Omega,\Theta)=\ell(X,Y;\Omega,\Theta)+r(\Theta)+\mathcal{L}(\Omega)$ that permits independent training of local experts and a convex, temporally smooth combination via weights $\Omega(t)$, with a weight-shaping term $\mathcal{L}(\Omega)$ to discourage abrupt switches. The authors present both a coordinate-descent learning algorithm and an ADMM-based approach to train the mixtures, including convergence guarantees under convexity and a windowed strategy for scalability; they also provide a mechanism to predict weights on new data through gating. Case studies—numerical and an experimental side-slip estimation—demonstrate that the method yields interpretable, accurate model blends that can outperform serial/parallel baselines and existing grey-box/black-box hybrids, while maintaining explainability through the weight trajectories. This framework has practical implications for control and system identification, enabling reliable, interpretable data-driven modeling that respects physical priors and temporal coherence.
Abstract
Traditional models grounded in first principles often struggle with accuracy as the system's complexity increases. Conversely, machine learning approaches, while powerful, face challenges in interpretability and in handling physical constraints. Efforts to combine these models often often stumble upon difficulties in finding a balance between accuracy and complexity. To address these issues, we propose a comprehensive framework based on a "mixture of experts" rationale. This approach enables the data-based fusion of diverse local models, leveraging the full potential of first-principle-based priors. Our solution allows independent training of experts, drawing on techniques from both machine learning and system identification, and it supports both collaborative and competitive learning paradigms. To enhance interpretability, we penalize abrupt variations in the expert's combination. Experimental results validate the effectiveness of our approach in producing an interpretable combination of models closely resembling the target phenomena.
