Table of Contents
Fetching ...

A method of supervised learning from conflicting data with hidden contexts

Tianren Zhang, Yizhou Jiang, Feng Chen

TL;DR

This work proposes a method LEAF that introduces an allocation function, which learns to assign conflicting data to different predictive models, and establishes a connection between LEAF and a variant of the Expectation-Maximization algorithm, allowing it to derive an analytical expression for the allocation function.

Abstract

Conventional supervised learning assumes a stable input-output relationship. However, this assumption fails in open-ended training settings where the input-output relationship depends on hidden contexts. In this work, we formulate a more general supervised learning problem in which training data is drawn from multiple unobservable domains, each potentially exhibiting distinct input-output maps. This inherent conflict in data renders standard empirical risk minimization training ineffective. To address this challenge, we propose a method LEAF that introduces an allocation function, which learns to assign conflicting data to different predictive models. We establish a connection between LEAF and a variant of the Expectation-Maximization algorithm, allowing us to derive an analytical expression for the allocation function. Finally, we provide a theoretical analysis of LEAF and empirically validate its effectiveness on both synthetic and real-world tasks involving conflicting data.

A method of supervised learning from conflicting data with hidden contexts

TL;DR

This work proposes a method LEAF that introduces an allocation function, which learns to assign conflicting data to different predictive models, and establishes a connection between LEAF and a variant of the Expectation-Maximization algorithm, allowing it to derive an analytical expression for the allocation function.

Abstract

Conventional supervised learning assumes a stable input-output relationship. However, this assumption fails in open-ended training settings where the input-output relationship depends on hidden contexts. In this work, we formulate a more general supervised learning problem in which training data is drawn from multiple unobservable domains, each potentially exhibiting distinct input-output maps. This inherent conflict in data renders standard empirical risk minimization training ineffective. To address this challenge, we propose a method LEAF that introduces an allocation function, which learns to assign conflicting data to different predictive models. We establish a connection between LEAF and a variant of the Expectation-Maximization algorithm, allowing us to derive an analytical expression for the allocation function. Finally, we provide a theoretical analysis of LEAF and empirically validate its effectiveness on both synthetic and real-world tasks involving conflicting data.

Paper Structure

This paper contains 56 sections, 11 theorems, 63 equations, 9 figures, 8 tables, 1 algorithm.

Key Result

Theorem 3.1

Assume that all target functions are realizable. Then, the following two propositions are equivalent:

Figures (9)

  • Figure 1: Comparison between classical supervised learning (SL) and our formulation. Classical SL considers observable domains (datasets) with stable input-output maps; we consider unobservable domains with potentially conflicting input-output maps.
  • Figure 2: Our considered regression task where ERM fails.
  • Figure 3: Results on the regression task where the examples are simultaneously sampled from three heterogeneous functions. LEAF is the only method that smoothly recovers all functions.
  • Figure 4: Classification datasets with paralleled, hierarchical, and opposite input-output relationship.
  • Figure 5: Visualization of different low-level feature spaces learned by LEAF models on Fashion Product Images.
  • ...and 4 more figures

Theorems & Definitions (27)

  • Definition 2.1: Conflict domains
  • Definition 2.2: Conflict rank
  • Definition 2.3: LEAF objective
  • Theorem 3.1: Identifiability of conflicting domains
  • Remark 3.2
  • Theorem 3.3: PAC learnability
  • Remark 3.4
  • Theorem 3.5: Generalization error
  • Remark 3.6
  • Theorem 3.1: Identifiability of conflicting domains
  • ...and 17 more