Table of Contents
Fetching ...

PAC Learning is just Bipartite Matching (Sort of)

Shaddin Dughmi

TL;DR

The paper argues that PAC learning can be understood through bipartite matching by employing a transductive learning model and one-inclusion graphs, recasting multiclass and other loss settings as matching problems. It develops a precise graph-theoretic characterization of optimal transductive learning via Hall complexity and extends the framework to general loss functions using Functional Dependency Structures (FDS), establishing a compactness principle that ties sample complexity to finite projections. It also outlines algorithmic templates derived from the matching viewpoint, including local regularization and unsupervised pre-training, offering a practical pathway for near-optimal multiclass learning. The work highlights equivalences and gaps between transductive and PAC models, proposes local computation avenues, and suggests future directions for unifying combinatorial optimization with learning theory through Hall-type and matching-based analyses.

Abstract

The main goal of this article is to convince you, the reader, that supervised learning in the Probably Approximately Correct (PAC) model is closely related to -- of all things -- bipartite matching! En-route from PAC learning to bipartite matching, I will overview a particular transductive model of learning, and associated one-inclusion graphs, which can be viewed as a generalization of some of the hat puzzles that are popular in recreational mathematics. Whereas this transductive model is far from new, it has recently seen a resurgence of interest as a tool for tackling deep questions in learning theory. A secondary purpose of this article could be as a (biased) tutorial on the connections between the PAC and transductive models of learning.

PAC Learning is just Bipartite Matching (Sort of)

TL;DR

The paper argues that PAC learning can be understood through bipartite matching by employing a transductive learning model and one-inclusion graphs, recasting multiclass and other loss settings as matching problems. It develops a precise graph-theoretic characterization of optimal transductive learning via Hall complexity and extends the framework to general loss functions using Functional Dependency Structures (FDS), establishing a compactness principle that ties sample complexity to finite projections. It also outlines algorithmic templates derived from the matching viewpoint, including local regularization and unsupervised pre-training, offering a practical pathway for near-optimal multiclass learning. The work highlights equivalences and gaps between transductive and PAC models, proposes local computation avenues, and suggests future directions for unifying combinatorial optimization with learning theory through Hall-type and matching-based analyses.

Abstract

The main goal of this article is to convince you, the reader, that supervised learning in the Probably Approximately Correct (PAC) model is closely related to -- of all things -- bipartite matching! En-route from PAC learning to bipartite matching, I will overview a particular transductive model of learning, and associated one-inclusion graphs, which can be viewed as a generalization of some of the hat puzzles that are popular in recreational mathematics. Whereas this transductive model is far from new, it has recently seen a resurgence of interest as a tool for tackling deep questions in learning theory. A secondary purpose of this article could be as a (biased) tutorial on the connections between the PAC and transductive models of learning.

Paper Structure

This paper contains 36 sections, 5 figures.

Figures (5)

  • Figure 1: Realizable binary classification for axis aligned rectangles in the transductive model.
  • Figure 2: Example OIG for realizable binary classification on three datapoints.
  • Figure 3: Example OIG for realizable multiclass classification, in traditional and bipartite forms.
  • Figure 4: Matching on infinite bipartite graphs absent degree constraints.
  • Figure 5: A local regularizer may favor the simplicity of $h_1$ on test points drawn from the right region of the domain, and the simplicity of $h_2$ on test points drawn from the left region.