Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks

Hiroki Waida; Takafumi Kanamori

Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks

Hiroki Waida, Takafumi Kanamori

TL;DR

The paper addresses learning multiple smooth boundaries from pairwise covariate data, where a binary label indicates similarity between paired inputs. It introduces a contrastive-learning ERM over a localized class of simplex-valued deep ReLU networks, demonstrating consistency and a minimax-optimal rate up to logarithmic factors for the $L^{2}$-risk of boundary indicators. A key technical contribution is the localization-based analysis that connects pairwise hinge-loss excess risk to the desired $L^{2}$-risk, enabling sharp learning guarantees in the pairwise setting. The results extend to global ERMs and yield downstream multiclass classification guarantees, highlighting the practical impact for self-supervised and multiclass nonparametric problems in high dimensions.

Abstract

The topic of nonparametric estimation of smooth boundaries is extensively studied in the conventional setting where pairs of single covariate and response variable are observed. However, this traditional setting often suffers from the cost of data collection. Recent years have witnessed the consistent development of learning algorithms for binary classification problems where one can instead observe paired covariates and binary variable representing the statistical relationship between the covariates. In this work, we theoretically study the question of whether multiple smooth boundaries are learnable if the pairwise binary classification setting is considered. We investigate the question with the statistical dependence of paired covariates to develop a learning algorithm using vector-valued functions. The main theorem shows that there is an empirical risk minimization algorithm in a class of deep ReLU networks such that it produces a consistent estimator for indicator functions defined with smooth boundaries. We also discuss how the pairwise binary classification setting is different from the conventional settings, focusing on the structural condition of function classes. As a by-product, we apply the main theorem to a multiclass nonparametric classification problem where the estimation performance is measured by the excess risk in terms of misclassification.

Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks

TL;DR

-risk of boundary indicators. A key technical contribution is the localization-based analysis that connects pairwise hinge-loss excess risk to the desired

-risk, enabling sharp learning guarantees in the pairwise setting. The results extend to global ERMs and yield downstream multiclass classification guarantees, highlighting the practical impact for self-supervised and multiclass nonparametric problems in high dimensions.

Abstract

Paper Structure (64 sections, 28 theorems, 194 equations, 1 figure, 6 tables, 1 algorithm)

This paper contains 64 sections, 28 theorems, 194 equations, 1 figure, 6 tables, 1 algorithm.

Introduction
Approach
Main Contributions
Methodology:
Results:
Technical contributions:
Application:
Organization of the Paper
Problem Setting
Notations
Assumptions
Noise condition (NC).
Smooth boundaries.
Main assumptions.
Definition of the map $\mathscr{S}_{P}$.
...and 49 more sections

Key Result

Proposition 3

For every $\mathscr{S}\in\mathscr{P}_{\alpha,+}$, there is a Borel probability measure $P\in\mathcal{P}_{\alpha}$ such that we have $\mathscr{S}\sim \mathscr{S}_{P}$.

Figures (1)

Figure 1: An illustration in the case where $d_{1}=3$, and both $z_{0}=f(x)$ and $z_{4}=f(x')$ are in the simplex $\{z=\sum_{h\in\{i,j,h_{0}\}}c_{h}v_{h}\;|\; c_{i},c_{j},c_{h_{0}}\in[0,1],\; c_{i}+c_{j}+c_{h_{0}}=1\}$. Note that in this case we have $z_{1}=z_{2}$, though in general $z_{1}$ is not on the line segment connecting $v_{i}$ and $v_{h_{0}}$. We used NumPyharris2020array and Matplotlibhunter2007matplotlib to plot this figure.

Theorems & Definitions (66)

Definition 1: Class $\mathcal{P}_{\alpha}$
Definition 2
Proposition 3
Remark 4
Definition 5: Contrastive function
Proposition 6
Definition 7: Loss function
Proposition 8
Example 9
Definition 10: $(\beta,\beta_{0},P)$-localized subclass
...and 56 more

Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks

TL;DR

Abstract

Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (66)