Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models

Théo Bourdais; Houman Owhadi

Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models

Théo Bourdais, Houman Owhadi

TL;DR

MEVA reframes ensemble aggregation as a variance-minimization problem for a non-intrusive collection of black-box predictors, using a pointwise linear aggregate $M_A(x)=\alpha^*(x)^T M(x)$. The key theoretical contribution shows a decomposition $\alpha^* = \lambda\alpha_V + (1-\lambda)\alpha_R$ and compares the asymptotic behavior of the variance-minimizing estimator $\hat{\alpha}_V$ against the empirical-error estimator $\hat{\alpha}_E$, proving MEVA can converge faster in scarce-data regimes. Practically, MEVA estimates a diagonal error covariance $A(x)$ via a fixed eigenbasis to obtain $\alpha_V(x)=\mathrm{Softmax}(-\lambda(x))$, where the log-eigenvalues $\lambda_i(x)$ are learned from data. The framework is validated on a Boston housing regression task and on PDE solver aggregation for Laplace and Burgers equations, showing robust improvements over MEEA and enabling effective operator-learning-based aggregation of diverse solvers.

Abstract

Whether deterministic or stochastic, models can be viewed as functions designed to approximate a specific quantity of interest. We introduce Minimal Empirical Variance Aggregation (MEVA), a data-driven framework that integrates predictions from various models, enhancing overall accuracy by leveraging the individual strengths of each. This non-intrusive, model-agnostic approach treats the contributing models as black boxes and accommodates outputs from diverse methodologies, including machine learning algorithms and traditional numerical solvers. We advocate for a point-wise linear aggregation process and consider two methods for optimizing this aggregate: Minimal Error Aggregation (MEA), which minimizes the prediction error, and Minimal Variance Aggregation (MVA), which focuses on reducing variance. We prove a theorem showing that MVA can be more robustly estimated from data than MEA, making MEVA superior to Minimal Empirical Error Aggregation (MEEA). Unlike MEEA, which interpolates target values directly, MEVA formulates aggregation as an error estimation problem, which can be performed using any backbone learning paradigm. We demonstrate the versatility and effectiveness of our framework across various applications, including data science and partial differential equations, illustrating its ability to significantly enhance both robustness and accuracy.

Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models

TL;DR

MEVA reframes ensemble aggregation as a variance-minimization problem for a non-intrusive collection of black-box predictors, using a pointwise linear aggregate

. The key theoretical contribution shows a decomposition

and compares the asymptotic behavior of the variance-minimizing estimator

against the empirical-error estimator

, proving MEVA can converge faster in scarce-data regimes. Practically, MEVA estimates a diagonal error covariance

via a fixed eigenbasis to obtain

, where the log-eigenvalues

are learned from data. The framework is validated on a Boston housing regression task and on PDE solver aggregation for Laplace and Burgers equations, showing robust improvements over MEEA and enabling effective operator-learning-based aggregation of diverse solvers.

Abstract

Paper Structure (46 sections, 1 theorem, 86 equations, 8 figures, 2 tables)

This paper contains 46 sections, 1 theorem, 86 equations, 8 figures, 2 tables.

Introduction
Related work
Contributions
The minimal error aggregation
Best linear aggregate when model correlations are known
Remark
Data-driven aggregation
Pathological example: A dubious trend
Aggregation through error estimation: the Minimal Variance Aggregate (MVA)
Modelling the error
A theorem on the superiority of MEVA over MEEA
Computing the aggregation: the Minimal Empirical Variance Aggregate (MEVA)
Experiments
Sanity check: aggregation on the Boston housing dataset
Aggregation of PDE solvers
...and 31 more sections

Key Result

Theorem 1

Assume $s,u,t\neq 0$, then there exist two sequences of random variables, $K^E_N$ and $K^V_N$, each of which converges in distribution to a distinct finite random variable, such that

Figures (8)

Figure 1: Pathological example (sec. \ref{['sec:pathological_example_1']})
Figure 2: Comparisons of the performance of the different models on the Boston housing dataset. Red bars are the performance of models trained using the training set and used in the aggregation. Blue bars show models trained on the training and validation set (train+val) to get a fair comparison with the aggregations. The aggregation does not use the models trained on train+val.
Figure 3: $\log$ MSE of the different methods for: (\ref{['fig:laplace-aggregation']}) the Laplace equation; (\ref{['fig:burgers-aggregation']}) Burger's equation. Samples are sorted by the error of the aggregate
Figure 4: (a) Real solution of the PDE (b) One of the models aggregated (c) Uniform average of all models (d) Proposed aggregate equation \ref{['eq:L2_def_alpha']}
Figure 5: Pathological example (sec. \ref{['sec:pathological_example_1']})
...and 3 more figures

Theorems & Definitions (1)

Theorem 1

Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models

TL;DR

Abstract

Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (1)