Table of Contents
Fetching ...

Quantifying Correlations of Machine Learning Models

Yuanyuan Li, Neeraj Sarna, Yang Lin

TL;DR

The paper tackles the risk of cascaded failures arising from correlated errors when multiple ML models operate in safety-critical settings. It defines three practical correlation scenarios, and proposes a formal framework to quantify cross-model error correlations using $\rho_{\hat{f}_1, \hat{f}_2}$ for regression and $\phi_K$ for classification, complemented by an average-error measure $E_{\text{avg}}$. Through an extensive empirical study across tabular, image, and text domains—including downstream fine-tuning of foundation models—it shows that similar algorithms on the same data, overlapping predictive features, and shared foundation-model weights all produce substantial error correlations. The results underline pervasive cross-model dependencies and motivate governance and mitigation strategies to prevent cascading failures in multi-model AI deployments.

Abstract

Machine Learning models are being extensively used in safety critical applications where errors from these models could cause harm to the user. Such risks are amplified when multiple machine learning models, which are deployed concurrently, interact and make errors simultaneously. This paper explores three scenarios where error correlations between multiple models arise, resulting in such aggregated risks. Using real-world data, we simulate these scenarios and quantify the correlations in errors of different models. Our findings indicate that aggregated risks are substantial, particularly when models share similar algorithms, training datasets, or foundational models. Overall, we observe that correlations across models are pervasive and likely to intensify with increased reliance on foundational models and widely used public datasets, highlighting the need for effective mitigation strategies to address these challenges.

Quantifying Correlations of Machine Learning Models

TL;DR

The paper tackles the risk of cascaded failures arising from correlated errors when multiple ML models operate in safety-critical settings. It defines three practical correlation scenarios, and proposes a formal framework to quantify cross-model error correlations using for regression and for classification, complemented by an average-error measure . Through an extensive empirical study across tabular, image, and text domains—including downstream fine-tuning of foundation models—it shows that similar algorithms on the same data, overlapping predictive features, and shared foundation-model weights all produce substantial error correlations. The results underline pervasive cross-model dependencies and motivate governance and mitigation strategies to prevent cascading failures in multi-model AI deployments.

Abstract

Machine Learning models are being extensively used in safety critical applications where errors from these models could cause harm to the user. Such risks are amplified when multiple machine learning models, which are deployed concurrently, interact and make errors simultaneously. This paper explores three scenarios where error correlations between multiple models arise, resulting in such aggregated risks. Using real-world data, we simulate these scenarios and quantify the correlations in errors of different models. Our findings indicate that aggregated risks are substantial, particularly when models share similar algorithms, training datasets, or foundational models. Overall, we observe that correlations across models are pervasive and likely to intensify with increased reliance on foundational models and widely used public datasets, highlighting the need for effective mitigation strategies to address these challenges.

Paper Structure

This paper contains 13 sections, 5 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Pearson correlations between errors of different models on the same data (Scenario 1; tabular data). The errors are computed on California Housing dataset. Darker colors represent higher correlations.
  • Figure 2: $\phi_K$ correlations between errors of different models on the same data (Scenario 1; image data). The errors are computed on CIFAR-10 dataset.
  • Figure 3: $\phi_K$ correlations between errors of different models on the same data (Scenario 1; text data). The errors are computed on financial_phrasebank dataset.
  • Figure 4: Pearson correlations between errors of different models with overlapping features (Scenario 2; tabular data) on California Housing dataset.
  • Figure 5: Feature importance of XGBoost model on California Housing dataset.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 1: Correlation of error terms across models
  • Definition 2: Correlations of performance between fine-tuned models
  • Remark 1: Closed-form solution
  • Remark 2: Correlation coefficients