Table of Contents
Fetching ...

Domain Generalization by Functional Regression

Markus Holzleitner, Sergei V. Pereverzyev, Werner Zellinger

TL;DR

The paper addresses domain generalization by reframing it as a functional regression problem and introduces a linear operator $G$ that maps kernel mean embeddings of input marginals to domain-specific regression functions, i.e., $f_P(\cdot)=G\,m_{P_\mathcal{X}}(\cdot)+\varepsilon(\cdot)$ with $G\,m_{P_\mathcal{X}}(\cdot)=a_0(\cdot)+\int m_{P_\mathcal{X}}(x)\,\beta(\cdot,x)\,dx$. The authors propose a two-step algorithm: (i) per-domain ridge regression to estimate $f_{\mathbf{z}^{(i)}}^{\lambda_i}$ in domain-specific RKHSs, and (ii) regularized learning of a functional slope $\beta$ in a shared RKHS to construct $G$ and the final predictor $g$, which can use different kernels across domains. They derive finite-sample bounds showing that the excess risk decays at rate $N^{-{1/(1+c_6)}}$ (up to problem-dependent constants) and demonstrate with a numerical example that the method outperforms pooling and marginal transfer baselines. The work lays groundwork for first-principles, finite-sample analysis in domain generalization via operator learning and offers practical, domain-specific predictor construction with open-source code.

Abstract

The problem of domain generalization is to learn, given data from different source distributions, a model that can be expected to generalize well on new target distributions which are only seen through unlabeled samples. In this paper, we study domain generalization as a problem of functional regression. Our concept leads to a new algorithm for learning a linear operator from marginal distributions of inputs to the corresponding conditional distributions of outputs given inputs. Our algorithm allows a source distribution-dependent construction of reproducing kernel Hilbert spaces for prediction, and, satisfies finite sample error bounds for the idealized risk. Numerical implementations and source code are available.

Domain Generalization by Functional Regression

TL;DR

The paper addresses domain generalization by reframing it as a functional regression problem and introduces a linear operator that maps kernel mean embeddings of input marginals to domain-specific regression functions, i.e., with . The authors propose a two-step algorithm: (i) per-domain ridge regression to estimate in domain-specific RKHSs, and (ii) regularized learning of a functional slope in a shared RKHS to construct and the final predictor , which can use different kernels across domains. They derive finite-sample bounds showing that the excess risk decays at rate (up to problem-dependent constants) and demonstrate with a numerical example that the method outperforms pooling and marginal transfer baselines. The work lays groundwork for first-principles, finite-sample analysis in domain generalization via operator learning and offers practical, domain-specific predictor construction with open-source code.

Abstract

The problem of domain generalization is to learn, given data from different source distributions, a model that can be expected to generalize well on new target distributions which are only seen through unlabeled samples. In this paper, we study domain generalization as a problem of functional regression. Our concept leads to a new algorithm for learning a linear operator from marginal distributions of inputs to the corresponding conditional distributions of outputs given inputs. Our algorithm allows a source distribution-dependent construction of reproducing kernel Hilbert spaces for prediction, and, satisfies finite sample error bounds for the idealized risk. Numerical implementations and source code are available.
Paper Structure (17 sections, 7 theorems, 31 equations, 1 figure)

This paper contains 17 sections, 7 theorems, 31 equations, 1 figure.

Key Result

Lemma 1

The kernel mean embedding $m_{P'}$ of any $P'\in\mathcal{M}_1^+(\mathcal{X})$ w.r.t. a kernel $k\in\mathcal{K}$ is bounded in $L^2(P)$-norm for any $P\in\mathcal{M}_1^+(\mathcal{X})$ by

Figures (1)

  • Figure 1: Our approach maps kernel mean embeddings of input distributions (a) to regression functions (b, dashed) and allows to outperform ridge regression on pooled data (c, dashed), in contrast to blanchard2021domain (also c, dashed), which is illustrated by four random test predictions of our approach (d, dashed).

Theorems & Definitions (11)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Lemma 1
  • Lemma 2
  • Lemma 3: wolfer2022variance, Section 2, Remark 2.1
  • Lemma 4
  • Lemma 5
  • Theorem 6
  • ...and 1 more