Table of Contents
Fetching ...

Explainable data-driven modeling via mixture of experts: towards effective blending of grey and black-box models

Jessica Leoni, Valentina Breschi, Simone Formentin, Mara Tanelli

TL;DR

This work tackles the challenge of blending physics-informed (grey-box) priors with data-driven (black-box) models for complex systems by proposing a general mixture of experts (MoE) framework. It introduces a novel fitting objective $J(X,Y;\Omega,\Theta)=\ell(X,Y;\Omega,\Theta)+r(\Theta)+\mathcal{L}(\Omega)$ that permits independent training of local experts and a convex, temporally smooth combination via weights $\Omega(t)$, with a weight-shaping term $\mathcal{L}(\Omega)$ to discourage abrupt switches. The authors present both a coordinate-descent learning algorithm and an ADMM-based approach to train the mixtures, including convergence guarantees under convexity and a windowed strategy for scalability; they also provide a mechanism to predict weights on new data through gating. Case studies—numerical and an experimental side-slip estimation—demonstrate that the method yields interpretable, accurate model blends that can outperform serial/parallel baselines and existing grey-box/black-box hybrids, while maintaining explainability through the weight trajectories. This framework has practical implications for control and system identification, enabling reliable, interpretable data-driven modeling that respects physical priors and temporal coherence.

Abstract

Traditional models grounded in first principles often struggle with accuracy as the system's complexity increases. Conversely, machine learning approaches, while powerful, face challenges in interpretability and in handling physical constraints. Efforts to combine these models often often stumble upon difficulties in finding a balance between accuracy and complexity. To address these issues, we propose a comprehensive framework based on a "mixture of experts" rationale. This approach enables the data-based fusion of diverse local models, leveraging the full potential of first-principle-based priors. Our solution allows independent training of experts, drawing on techniques from both machine learning and system identification, and it supports both collaborative and competitive learning paradigms. To enhance interpretability, we penalize abrupt variations in the expert's combination. Experimental results validate the effectiveness of our approach in producing an interpretable combination of models closely resembling the target phenomena.

Explainable data-driven modeling via mixture of experts: towards effective blending of grey and black-box models

TL;DR

This work tackles the challenge of blending physics-informed (grey-box) priors with data-driven (black-box) models for complex systems by proposing a general mixture of experts (MoE) framework. It introduces a novel fitting objective that permits independent training of local experts and a convex, temporally smooth combination via weights , with a weight-shaping term to discourage abrupt switches. The authors present both a coordinate-descent learning algorithm and an ADMM-based approach to train the mixtures, including convergence guarantees under convexity and a windowed strategy for scalability; they also provide a mechanism to predict weights on new data through gating. Case studies—numerical and an experimental side-slip estimation—demonstrate that the method yields interpretable, accurate model blends that can outperform serial/parallel baselines and existing grey-box/black-box hybrids, while maintaining explainability through the weight trajectories. This framework has practical implications for control and system identification, enabling reliable, interpretable data-driven modeling that respects physical priors and temporal coherence.

Abstract

Traditional models grounded in first principles often struggle with accuracy as the system's complexity increases. Conversely, machine learning approaches, while powerful, face challenges in interpretability and in handling physical constraints. Efforts to combine these models often often stumble upon difficulties in finding a balance between accuracy and complexity. To address these issues, we propose a comprehensive framework based on a "mixture of experts" rationale. This approach enables the data-based fusion of diverse local models, leveraging the full potential of first-principle-based priors. Our solution allows independent training of experts, drawing on techniques from both machine learning and system identification, and it supports both collaborative and competitive learning paradigms. To enhance interpretability, we penalize abrupt variations in the expert's combination. Experimental results validate the effectiveness of our approach in producing an interpretable combination of models closely resembling the target phenomena.
Paper Structure (18 sections, 6 theorems, 92 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 6 theorems, 92 equations, 10 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

Assume that the weights $\omega_{i}(t)$ in eq:mixture_of_models are restricted to lay in $\{0,1\}$, for all $i=1,\ldots,M$ and $t=1,\ldots,T$. Then, the loss in eq:fitting_cost corresponds to the one introduced for jump models fitting in bemporad2018fitting.

Figures (10)

  • Figure 1: A schematic overview of the training [upper panels] and prediction [lower panels] logic of serial, parallel, and ensemble approaches. Gray and black box models are respectively depicted with gray and black rectangles, respectively. For the sake of simplicity, the ensemble is depicted considering only 2 black-box local models, but it can in principle include as many experts as one likes.
  • Figure 2: A scheme of the structure proposed for prediction, where the expertise of gray and black box models is combined based on their suitability (here indicated as trust) to describe the new data. For the sake of simplicity, we consider only 2 experts, but we could in principle include multiple experts of the two kinds.
  • Figure 3: Learning mixture weights: windowing strategy for $W=4$. Successive windows are depicted as rectangles of increasingly deep shades of blue, samples associated with two consecutive windows are filled with a line pattern and associated with the window dictating their weights.
  • Figure 4: Numerical example: true weights $\omega(t)$ used to generate the training set. The first component $\omega_{1}(t)$ is shown in black, while $\omega_{2}(t)$ is the dashed blue line.
  • Figure 5: Numerical example: true (black line) vs reconstructed weights (red dashed).
  • ...and 5 more figures

Theorems & Definitions (18)

  • Remark 1: On the regressor $x(t)$
  • Remark 2: Weights and features
  • Lemma 1: Relation with jump models fitting
  • Proof
  • Proposition 1: A probabilistic view on \ref{['eq:fitting_cost']}
  • Proof
  • Remark 3: Using quadratic regularization
  • Proposition 2: Statistical characterization of \ref{['eq:cost_mixture']}
  • Proof
  • Remark 4: An interpretation of \ref{['eq:least_sq_mix']}
  • ...and 8 more