Table of Contents
Fetching ...

Epistemic Errors of Imperfect Multitask Learners When Distributions Shift

Sabina J. Sloman, Michele Caprio, Samuel Kaski

TL;DR

This paper tackles how uncertainty-aware learners incur epistemic errors when faced with distribution shift in a multitask setting. It introduces a principled, decomposable epistemic error bound that separates model restrictions, data scarcity, and distribution shift as distinct sources of reducible error, formalized via total variation distance. The main result extends to special cases such as Bayesian transfer learning and total variation neighborhoods, and the authors discuss negative transfer and practical computability of the bound terms. The framework provides a diagnostic tool for when and how to reduce epistemic error by choosing appropriate inductive biases, data acquisition, or robust training strategies, with implications for uncertainty quantification in high-stakes domains. Limitations and future work point to extending the theory to conformal prediction and validating the bounds in more complex, real-world settings across time and higher dimensions.

Abstract

Uncertainty-aware machine learners, such as Bayesian neural networks, output a quantification of uncertainty instead of a point prediction. In this work, we provide uncertainty-aware learners with a principled framework to characterize, and identify ways to eliminate, errors that arise from reducible (epistemic) uncertainty. We introduce a principled definition of epistemic error, and provide a decompositional epistemic error bound which operates in the very general setting of imperfect multitask learning under distribution shift. In this setting, the training (source) data may arise from multiple tasks, the test (target) data may differ systematically from the source data tasks, and/or the learner may not arrive at an accurate characterization of the source data. Our bound separately attributes epistemic errors to each of multiple aspects of the learning procedure and environment. As corollaries of the general result, we provide epistemic error bounds specialized to the settings of Bayesian transfer learning and distribution shift within $ε$-neighborhoods. We additionally leverage the terms in our bound to provide a novel definition of negative transfer.

Epistemic Errors of Imperfect Multitask Learners When Distributions Shift

TL;DR

This paper tackles how uncertainty-aware learners incur epistemic errors when faced with distribution shift in a multitask setting. It introduces a principled, decomposable epistemic error bound that separates model restrictions, data scarcity, and distribution shift as distinct sources of reducible error, formalized via total variation distance. The main result extends to special cases such as Bayesian transfer learning and total variation neighborhoods, and the authors discuss negative transfer and practical computability of the bound terms. The framework provides a diagnostic tool for when and how to reduce epistemic error by choosing appropriate inductive biases, data acquisition, or robust training strategies, with implications for uncertainty quantification in high-stakes domains. Limitations and future work point to extending the theory to conformal prediction and validating the bounds in more complex, real-world settings across time and higher dimensions.

Abstract

Uncertainty-aware machine learners, such as Bayesian neural networks, output a quantification of uncertainty instead of a point prediction. In this work, we provide uncertainty-aware learners with a principled framework to characterize, and identify ways to eliminate, errors that arise from reducible (epistemic) uncertainty. We introduce a principled definition of epistemic error, and provide a decompositional epistemic error bound which operates in the very general setting of imperfect multitask learning under distribution shift. In this setting, the training (source) data may arise from multiple tasks, the test (target) data may differ systematically from the source data tasks, and/or the learner may not arrive at an accurate characterization of the source data. Our bound separately attributes epistemic errors to each of multiple aspects of the learning procedure and environment. As corollaries of the general result, we provide epistemic error bounds specialized to the settings of Bayesian transfer learning and distribution shift within -neighborhoods. We additionally leverage the terms in our bound to provide a novel definition of negative transfer.

Paper Structure

This paper contains 45 sections, 22 theorems, 61 equations, 3 figures.

Key Result

Lemma 1

Given a model class $\pi$, a predictor $\widehat{P} \in \pi$, a second-order bounded source task distribution $\mathcal{Q}^{S}$, $\widehat{P} = \overline{\mathcal{Q}^{S}}$ (perfect learning), and $\mathcal{Q}^{T} = \mathcal{Q}^{S}$ (no distribution shift),

Figures (3)

  • Figure 1: $\mathbf{e}$ as a function of (a) convergence of the posterior distribution and (b) neighborhood size. For each neighborhood size $\epsilon$, the plots show 500 simulations where variation is respect to the set of source tasks and target task sampled from their respective task distributions.
  • Figure 2: \ref{['fig:pos', 'fig:neg', 'fig:posneg']} show schematically how additional learning from the source data (decreased distance of a predictor from $\overline{\mathcal{Q}^{S}}$) can have different effects on the margin of epistemic error (distance to a $Q^t$, indicated by the solid lines).
  • Figure 3: $\mathbf{e}$ incurred in experiments designed to reflect the settings in \ref{['fig:pos', 'fig:neg', 'fig:posneg']}, respectively (the horizontal axes of \ref{['fig:pos', 'fig:neg', 'fig:posneg']} represent the value of $\beta_1$ and the vertical axes represent the value of $\beta_2$). Looseness in the epistemic error bound is computed as $\overline{\mathrm{d_{TV}}}\left( \widehat{P}, Q^{t} \right) - \left( \overline{\mathrm{d_{TV}}}\left( \widehat{P}, P_{\star} \right) + \overline{\mathrm{d_{TV}}}\left( \overline{\mathcal{Q}^{S}}, \overline{\mathcal{Q}^{T}} \right) \right) \approx \mathbf{e} - \left( \mathcal{C} + \mathcal{D} \right)$ (recalling that in this case $P_{\star} = \overline{\mathcal{Q}^{S}}$). Lines and error bars denote means and standard errors, respectively, across 500 simulations, where variation is with respect to the set of source tasks and target task sampled from their respective task distributions.

Theorems & Definitions (48)

  • Definition 1: Barycenter of $\mathcal{Q}$
  • Definition 2: Variability of $\mathcal{Q}$
  • Definition 3: First- and Second-Order Boundedness
  • Definition 4: Total variation (TV) distance ($\mathrm{d_{TV}}$)
  • Definition 5: Epistemic error
  • Definition 6: Epistemic error bound
  • Lemma 1: Epistemic error depends on task variability
  • Definition 7: Approximation bias ($\mathcal{B}$)
  • Definition 8: Lack of convergence ($\mathcal{C}$)
  • Lemma 2: Epistemic error depends on task variability, model restrictions, and data scarcity
  • ...and 38 more