Table of Contents
Fetching ...

Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning

Junyu Zhou, Shuo Huang, Han Feng, Puyu Wang, Ding-Xuan Zhou

TL;DR

This work significantly relaxes restrictive assumptions and establishes a sharp oracle inequality of the empirical minimizer with a general hypothesis space for the Lipschitz continuous pairwise losses and designs a targeted hypothesis space composed of networks with this structure and controllable complexity.

Abstract

In this paper, we are concerned with the generalization performance of non-parametric estimation for pairwise learning. Most of the existing work requires the hypothesis space to be convex or a VC-class, and the loss to be convex. However, these restrictive assumptions limit the applicability of the results in studying many popular methods, especially kernel methods and neural networks. We significantly relax these restrictive assumptions and establish a sharp oracle inequality of the empirical minimizer with a general hypothesis space for the Lipschitz continuous pairwise losses. As an example, we apply our general results to study pairwise least squares regression and derive an excess population risk bound that matches the minimax lower bound for the pointwise least squares regression. The key novelty lies in constructing a structured deep ReLU neural network to approximate the true predictor, and in designing a targeted hypothesis space composed of networks with this structure and controllable complexity. Experiments validate the effectiveness of the proposed method. This example demonstrates that the obtained general results indeed help us to explore the generalization performance on a variety of problems that cannot be handled by existing approaches.

Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning

TL;DR

This work significantly relaxes restrictive assumptions and establishes a sharp oracle inequality of the empirical minimizer with a general hypothesis space for the Lipschitz continuous pairwise losses and designs a targeted hypothesis space composed of networks with this structure and controllable complexity.

Abstract

In this paper, we are concerned with the generalization performance of non-parametric estimation for pairwise learning. Most of the existing work requires the hypothesis space to be convex or a VC-class, and the loss to be convex. However, these restrictive assumptions limit the applicability of the results in studying many popular methods, especially kernel methods and neural networks. We significantly relax these restrictive assumptions and establish a sharp oracle inequality of the empirical minimizer with a general hypothesis space for the Lipschitz continuous pairwise losses. As an example, we apply our general results to study pairwise least squares regression and derive an excess population risk bound that matches the minimax lower bound for the pointwise least squares regression. The key novelty lies in constructing a structured deep ReLU neural network to approximate the true predictor, and in designing a targeted hypothesis space composed of networks with this structure and controllable complexity. Experiments validate the effectiveness of the proposed method. This example demonstrates that the obtained general results indeed help us to explore the generalization performance on a variety of problems that cannot be handled by existing approaches.
Paper Structure (16 sections, 17 theorems, 104 equations, 10 figures, 3 tables)

This paper contains 16 sections, 17 theorems, 104 equations, 10 figures, 3 tables.

Key Result

Lemma 1

For a VC-class $\mathcal{F}$ of functions with uniform bound $F$, one has for any probability measure $\rho$, for an absolute constant $C\!>\!0$ and $0 \!<\! \epsilon \!<\!1$, where $L^2_\rho$ denotes the $L^{2}$ norm with respect to $\rho_{{\boldsymbol{x}}}$.

Figures (10)

  • Figure 1: Pointwise learning vs. Pairwise learning
  • Figure 2: Diagram of the relationships among theorems and assumptions
  • Figure 3: Error decomposition
  • Figure 4: Structure of the designed anti-symmetric deep ReLU network \ref{['eq:structured_NNs']} with input $x,x'\in\mathcal{X}$.
  • Figure 5: B-spline of order $7$
  • ...and 5 more figures

Theorems & Definitions (36)

  • Definition 1: Covering number HDP
  • Definition 2: Pseudo-dimension neuralnetworklearningVC
  • Lemma 1: empirical
  • Definition 3: Variance-expectation bound EM_Bartlett
  • Theorem 1
  • Remark 1
  • Proposition 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • ...and 26 more