Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks
Hiroki Waida, Takafumi Kanamori
TL;DR
The paper addresses learning multiple smooth boundaries from pairwise covariate data, where a binary label indicates similarity between paired inputs. It introduces a contrastive-learning ERM over a localized class of simplex-valued deep ReLU networks, demonstrating consistency and a minimax-optimal rate up to logarithmic factors for the $L^{2}$-risk of boundary indicators. A key technical contribution is the localization-based analysis that connects pairwise hinge-loss excess risk to the desired $L^{2}$-risk, enabling sharp learning guarantees in the pairwise setting. The results extend to global ERMs and yield downstream multiclass classification guarantees, highlighting the practical impact for self-supervised and multiclass nonparametric problems in high dimensions.
Abstract
The topic of nonparametric estimation of smooth boundaries is extensively studied in the conventional setting where pairs of single covariate and response variable are observed. However, this traditional setting often suffers from the cost of data collection. Recent years have witnessed the consistent development of learning algorithms for binary classification problems where one can instead observe paired covariates and binary variable representing the statistical relationship between the covariates. In this work, we theoretically study the question of whether multiple smooth boundaries are learnable if the pairwise binary classification setting is considered. We investigate the question with the statistical dependence of paired covariates to develop a learning algorithm using vector-valued functions. The main theorem shows that there is an empirical risk minimization algorithm in a class of deep ReLU networks such that it produces a consistent estimator for indicator functions defined with smooth boundaries. We also discuss how the pairwise binary classification setting is different from the conventional settings, focusing on the structural condition of function classes. As a by-product, we apply the main theorem to a multiclass nonparametric classification problem where the estimation performance is measured by the excess risk in terms of misclassification.
