Table of Contents
Fetching ...

Domain Generalisation via Imprecise Learning

Anurag Singh, Siu Lun Chau, Shahine Bouabid, Krikamol Muandet

TL;DR

This work tackles the challenge of out-of-distribution generalisation under deployment uncertainty by introducing Imprecise Domain Generalisation (IDG), which lets a learner optimize against a continuum of generalisation strategies without committing to a single notion during training. The framework uses an imprecise risk optimisation (IRO) approach and a deployment-time operator framework, enabling risk aversion to be tuned via CVaR level $\lambda\in[0,1]$ and aggregated through a credal set ${\mathbb K}(Z)$. Core theoretical contributions include the introduction of Continuous-Pareto (C-Pareto) optimality for augmented hypotheses ${\mathcal H}_{\Lambda}$, a gradient-based IRO algorithm with a high-probability excess risk bound of order $O(n^{-1/2}+m^{-1/2})$, and results showing that scalarising across $\Lambda$ preserves Bayes-optimality on the support. Empirically, IDG demonstrates robust generalisation across synthetic domains, CMNIST with domain-specific risk, and real-world Bike Rentals, achieving low maximal regret and competitive performance across operator preferences. Overall, the approach decouples learning from deployment-time generalisation choice, offering a principled way to handle generalisation uncertainty in practical OOD settings.

Abstract

Out-of-distribution (OOD) generalisation is challenging because it involves not only learning from empirical data, but also deciding among various notions of generalisation, e.g., optimising the average-case risk, worst-case risk, or interpolations thereof. While this choice should in principle be made by the model operator like medical doctors, this information might not always be available at training time. The institutional separation between machine learners and model operators leads to arbitrary commitments to specific generalisation strategies by machine learners due to these deployment uncertainties. We introduce the Imprecise Domain Generalisation framework to mitigate this, featuring an imprecise risk optimisation that allows learners to stay imprecise by optimising against a continuous spectrum of generalisation strategies during training, and a model framework that allows operators to specify their generalisation preference at deployment. Supported by both theoretical and empirical evidence, our work showcases the benefits of integrating imprecision into domain generalisation.

Domain Generalisation via Imprecise Learning

TL;DR

This work tackles the challenge of out-of-distribution generalisation under deployment uncertainty by introducing Imprecise Domain Generalisation (IDG), which lets a learner optimize against a continuum of generalisation strategies without committing to a single notion during training. The framework uses an imprecise risk optimisation (IRO) approach and a deployment-time operator framework, enabling risk aversion to be tuned via CVaR level and aggregated through a credal set . Core theoretical contributions include the introduction of Continuous-Pareto (C-Pareto) optimality for augmented hypotheses , a gradient-based IRO algorithm with a high-probability excess risk bound of order , and results showing that scalarising across preserves Bayes-optimality on the support. Empirically, IDG demonstrates robust generalisation across synthetic domains, CMNIST with domain-specific risk, and real-world Bike Rentals, achieving low maximal regret and competitive performance across operator preferences. Overall, the approach decouples learning from deployment-time generalisation choice, offering a principled way to handle generalisation uncertainty in practical OOD settings.

Abstract

Out-of-distribution (OOD) generalisation is challenging because it involves not only learning from empirical data, but also deciding among various notions of generalisation, e.g., optimising the average-case risk, worst-case risk, or interpolations thereof. While this choice should in principle be made by the model operator like medical doctors, this information might not always be available at training time. The institutional separation between machine learners and model operators leads to arbitrary commitments to specific generalisation strategies by machine learners due to these deployment uncertainties. We introduce the Imprecise Domain Generalisation framework to mitigate this, featuring an imprecise risk optimisation that allows learners to stay imprecise by optimising against a continuous spectrum of generalisation strategies during training, and a model framework that allows operators to specify their generalisation preference at deployment. Supported by both theoretical and empirical evidence, our work showcases the benefits of integrating imprecision into domain generalisation.
Paper Structure (36 sections, 13 theorems, 52 equations, 4 figures, 4 tables, 3 algorithms)

This paper contains 36 sections, 13 theorems, 52 equations, 4 figures, 4 tables, 3 algorithms.

Key Result

Lemma 3.1

The binary relation $\succeq$ represented by ${\mathbb K}(Z)$ is such that for $f, g\in{\mathcal{H}}$, $f\succeq g$, if and only if $\mathbb{E}_{{\mathbb P}}[Z_f] \leq \mathbb{E}_{{\mathbb P}}[Z_g]$ for every ${\mathbb P}\in{\mathbb K}(Z)$.

Figures (4)

  • Figure 1: An illustration of our proposed imprecise learning framework. We allow learners to stay imprecise to avoid over-commit in light of generalisation uncertainty. Instead, we defer this choice of precise generalisation to the operator.
  • Figure 2: Experiments comparing imprecise learning (IL) with various precise learners with precise hypothesis ($\textbf{PL-}f$) and with augmented hypothesis ($\textbf{PL-}\bar{h}$). 1 standard deviation is included and experiments are repeated 5 times.
  • Figure 3: \ref{['fig:1d_linear_beta_data']} illustrated the data and the ideal learner $f_{\lambda}(\hat{\theta})\in{\mathcal{H}}$ for $\lambda \in \{0.05, \dots, 0.95\}$. \ref{['fig:1d_linear_beta_landscape']} describes the landscape of the objective function $\rho$ (CVaR) for the ideal learner. We plot $\hat{\theta}$ as circles.\ref{['fig:1d_linear_beta_risk']} describes the Risk profile for $\lambda \in \{0.05, \dots, 0.95\}$ for the ideal learner. \ref{['fig:1d_linear_beta_risk_iro']} describes the Risk profile for $\lambda \in \{0.05, \dots, 0.95\}$ Imprecise Learner.
  • Figure 4: In Figure \ref{['fig:dag_of_cmnist']} we describe the features that affect the target. The mechanism by which color affects target changes across environments. However, shape has a stable mechanism across environments. In Figure \ref{['fig:dist_of_cmnist_train']} we consider a long tail distribution of environments from which we sample training environments. This is often realistic that many subpopulations are underrepresented in training data, eg low resource languages for translation tasks.

Theorems & Definitions (28)

  • Lemma 3.1
  • Definition 3.2: C-Pareto optimality
  • Definition 3.3: C-Pareto stationary
  • Definition 3.4: Conditional Value-at-Risk rockafellarConditionalValueatriskGeneral2002
  • Definition 3.5: C-Pareto optimal augmented hypothesis
  • Proposition 3.6
  • Proposition 3.7
  • Theorem 4.1
  • Proposition 4.2
  • proof
  • ...and 18 more