Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning

Junyu Zhou; Shuo Huang; Han Feng; Puyu Wang; Ding-Xuan Zhou

Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning

Junyu Zhou, Shuo Huang, Han Feng, Puyu Wang, Ding-Xuan Zhou

TL;DR

This work significantly relaxes restrictive assumptions and establishes a sharp oracle inequality of the empirical minimizer with a general hypothesis space for the Lipschitz continuous pairwise losses and designs a targeted hypothesis space composed of networks with this structure and controllable complexity.

Abstract

In this paper, we are concerned with the generalization performance of non-parametric estimation for pairwise learning. Most of the existing work requires the hypothesis space to be convex or a VC-class, and the loss to be convex. However, these restrictive assumptions limit the applicability of the results in studying many popular methods, especially kernel methods and neural networks. We significantly relax these restrictive assumptions and establish a sharp oracle inequality of the empirical minimizer with a general hypothesis space for the Lipschitz continuous pairwise losses. As an example, we apply our general results to study pairwise least squares regression and derive an excess population risk bound that matches the minimax lower bound for the pointwise least squares regression. The key novelty lies in constructing a structured deep ReLU neural network to approximate the true predictor, and in designing a targeted hypothesis space composed of networks with this structure and controllable complexity. Experiments validate the effectiveness of the proposed method. This example demonstrates that the obtained general results indeed help us to explore the generalization performance on a variety of problems that cannot be handled by existing approaches.

Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning

TL;DR

Abstract

Paper Structure (16 sections, 17 theorems, 104 equations, 10 figures, 3 tables)

This paper contains 16 sections, 17 theorems, 104 equations, 10 figures, 3 tables.

Introduction
Learning Setting and Preliminaries
Main Results
Optimal rates with deep ReLU networks
A novel approximation of the true predictor
Pairwise least squares regression with deep ReLU networks
Experiments
Evaluation of Approximation Error
Non-transitive Pairwise Interactions
Real-world data
Conclusion
Proofs for an oracle inequality
Upper bounds for $S_1(\mathcal{H})$
Upper bounds for $S_2(\mathcal{H})$
Proof of Theorem 1
...and 1 more sections

Key Result

Lemma 1

For a VC-class $\mathcal{F}$ of functions with uniform bound $F$, one has for any probability measure $\rho$, for an absolute constant $C\!>\!0$ and $0 \!<\! \epsilon \!<\!1$, where $L^2_\rho$ denotes the $L^{2}$ norm with respect to $\rho_{{\boldsymbol{x}}}$.

Figures (10)

Figure 1: Pointwise learning vs. Pairwise learning
Figure 2: Diagram of the relationships among theorems and assumptions
Figure 3: Error decomposition
Figure 4: Structure of the designed anti-symmetric deep ReLU network \ref{['eq:structured_NNs']} with input $x,x'\in\mathcal{X}$.
Figure 5: B-spline of order $7$
...and 5 more figures

Theorems & Definitions (36)

Definition 1: Covering number HDP
Definition 2: Pseudo-dimension neuralnetworklearningVC
Lemma 1: empirical
Definition 3: Variance-expectation bound EM_Bartlett
Theorem 1
Remark 1
Proposition 1
Theorem 2
Theorem 3
Theorem 4
...and 26 more

Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning

TL;DR

Abstract

Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (36)