Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss

Tilahun M. Getu; Georges Kaddoum; M. Bennis

Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss

Tilahun M. Getu, Georges Kaddoum, M. Bennis

TL;DR

The paper tackles a fundamental open question: what are the intrinsic testing performance limits of deep learning-based binary classifiers trained with hinge loss? It develops an asymptotic theory for two network families—deep ReLU FNNs and deep FNNs with ReLU+Tanh—characterizing misclassification performance in regimes where the penultimate-layer output norms become very large or vanish. The authors prove that, under the stated limits, the misclassification probability cannot beat coin-toss performance ($P_e \le 1/2$), with universal applicability across data sizes, depth, and width; they validate these limits through extensive BPSK-over-AWGN experiments, demonstrating when the theory aligns with practice and when it does not. The work provides a foundational lens for interpreting DL-based binary classifiers, highlighting the gap between empirical gains and fundamental limits and motivating non-asymptotic and multi-class extensions for practical relevance.

Abstract

Although deep learning (DL) has led to several breakthroughs in many disciplines, the fundamental understanding on why and how DL is empirically successful remains elusive. To attack this fundamental problem and unravel the mysteries behind DL's empirical successes, significant innovations toward a unified theory of DL have been made. Although these innovations encompass nearly fundamental advances in optimization, generalization, and approximation, no work has quantified the testing performance of a DL-based algorithm employed to solve a pattern classification problem. To overcome this fundamental challenge in part, this paper exposes the fundamental testing performance limits of DL-based binary classifiers trained with hinge loss. For binary classifiers that are based on deep rectified linear unit (ReLU) feedforward neural networks (FNNs) and deep FNNs with ReLU and Tanh activation, we derive their respective novel asymptotic testing performance limits, which are validated by extensive computer experiments.

Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss

TL;DR

), with universal applicability across data sizes, depth, and width; they validate these limits through extensive BPSK-over-AWGN experiments, demonstrating when the theory aligns with practice and when it does not. The work provides a foundational lens for interpreting DL-based binary classifiers, highlighting the gap between empirical gains and fundamental limits and motivating non-asymptotic and multi-class extensions for practical relevance.

Abstract

Paper Structure (23 sections, 3 theorems, 22 equations, 15 figures, 5 tables)

This paper contains 23 sections, 3 theorems, 22 equations, 15 figures, 5 tables.

Introduction
Related Works
Motivation and Context
Contributions
Prelude and System Setup
Prelude
System Setup
Problem Formulation
Problems for ReLU FNNs-Based Binary Classifiers
Problems for Binary Classifiers based on FNNs with ReLU and Tanh Activation
Asymptotic Testing Performance Limits
Performance Limits of Binary Classifiers that are Based on Deep ReLU FNNs
Performance Limits of Binary Classifiers based on FNNs with ReLU and Tanh Activation
Computer Experiments
DL-Based Binary Classification Settings
...and 8 more sections

Key Result

Lemma 1

For $y_n\in\{-1, 1\}$ and $n\in\mathbb{N}$, Proof. We provide the proof in arXiv_Getu_Fundamental_Limits'23.

Figures (15)

Figure 1: A scatter plot of $\| \bm{y}_{K-1,n}^{(T)} \|$ versus $n$ under All-SNR-T, $(K, H)=(8,8)$, and testing at 35 dB: the computed testing $P_e$ at 35 dB is $P_e=0.50049999$.
Figure 2: A scatter plot of $\| \bm{y}_{K-1,n}^{(T)} \|$ versus $n$ under High-SNR-T, $(K, H)=(8,8)$, and testing at 35 dB: the computed testing $P_e$ at 35 dB is $P_e=0.50049999$.
Figure 3: A scatter plot of $\| \bm{y}_{K-1,n}^{(T)} \|$ versus $n$ under Low-SNR-T, $(K, H)=(8,8)$, and testing at 35 dB: the computed testing $P_e$ at 35 dB is $P_e=0.50049999$.
Figure 4: A scatter plot of $\| \bm{y}_{K-1,n}^{(T)} \|$ versus $n$ under All-SNR-T, $(K, H)=(16,16)$, and testing at 35 dB: the computed testing $P_e$ at 35 dB is $P_e=1.0$.
Figure 5: A scatter plot of $\| \bm{y}_{K-1,n}^{(T)} \|$ versus $n$ under High-SNR-T, $(K, H)=(16,16)$, and testing at 35 dB: the computed testing $P_e$ at 35 dB is $P_e=0.50049999$.
...and 10 more figures

Theorems & Definitions (12)

Definition 1: Definition of FNNs Ingo_Error_Bounds'19
Definition 2: FNNs with dual activation functions Haykin_NNs_09
Remark 1
Lemma 1
Theorem 1
Remark 2
Remark 3
Remark 4
Theorem 2
Remark 5
...and 2 more

Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss

TL;DR

Abstract

Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (12)