Classification with Deep Neural Networks and Logistic Loss

Zihan Zhang; Lei Shi; Ding-Xuan Zhou

Classification with Deep Neural Networks and Logistic Loss

Zihan Zhang, Lei Shi, Ding-Xuan Zhou

TL;DR

The paper addresses the challenge of deriving generalization bounds for binary classification with deep ReLU networks trained under the logistic loss, even when the target function $f^*_{\phi,P}$ is unbounded. It introduces an oracle-type inequality using a crafted bivariate function $\psi$ to bound the excess $\phi$-risk without relying on boundedness, enabling sharp rates under Hölder smoothness of the conditional probability $\eta$ and under a compositional structure that yields dimension-free rates. The authors establish optimal convergence rates for the excess logistic risk $\mathcal{E}_P^\phi(\hat{f}_n^{\mathbf{FNN}})$ on the order of $O\left(\left(\frac{(\log n)^5}{n}\right)^{\beta/(\beta+d)}\right)$ (up to log factors), with corresponding misclassification rates via calibration and minimax lower bounds confirming near-optimality. Collectively, these results deepen theoretical understanding of DNN-based binary classification with logistic loss and offer insight into why high-dimensional problems can be effectively solved by deep networks in practice.

Abstract

Deep neural networks (DNNs) trained with the logistic loss (i.e., the cross entropy loss) have made impressive advancements in various binary classification tasks. However, generalization analysis for binary classification with DNNs and logistic loss remains scarce. The unboundedness of the target function for the logistic loss is the main obstacle to deriving satisfactory generalization bounds. In this paper, we aim to fill this gap by establishing a novel and elegant oracle-type inequality, which enables us to deal with the boundedness restriction of the target function, and using it to derive sharp convergence rates for fully connected ReLU DNN classifiers trained with logistic loss. In particular, we obtain optimal convergence rates (up to log factors) only requiring the Hölder smoothness of the conditional class probability $η$ of data. Moreover, we consider a compositional assumption that requires $η$ to be the composition of several vector-valued functions of which each component function is either a maximum value function or a Hölder smooth function only depending on a small number of its input variables. Under this assumption, we derive optimal convergence rates (up to log factors) which are independent of the input dimension of data. This result explains why DNN classifiers can perform well in practical high-dimensional classification problems. Besides the novel oracle-type inequality, the sharp convergence rates given in our paper also owe to a tight error bound for approximating the natural logarithm function near zero (where it is unbounded) by ReLU DNNs. In addition, we justify our claims for the optimality of rates by proving corresponding minimax lower bounds. All these results are new in the literature and will deepen our theoretical understanding of classification with DNNs.

Classification with Deep Neural Networks and Logistic Loss

TL;DR

The paper addresses the challenge of deriving generalization bounds for binary classification with deep ReLU networks trained under the logistic loss, even when the target function

is unbounded. It introduces an oracle-type inequality using a crafted bivariate function

to bound the excess

-risk without relying on boundedness, enabling sharp rates under Hölder smoothness of the conditional probability

and under a compositional structure that yields dimension-free rates. The authors establish optimal convergence rates for the excess logistic risk

on the order of

(up to log factors), with corresponding misclassification rates via calibration and minimax lower bounds confirming near-optimality. Collectively, these results deepen theoretical understanding of DNN-based binary classification with logistic loss and offer insight into why high-dimensional problems can be effectively solved by deep networks in practice.

Abstract

of data. Moreover, we consider a compositional assumption that requires

to be the composition of several vector-valued functions of which each component function is either a maximum value function or a Hölder smooth function only depending on a small number of its input variables. Under this assumption, we derive optimal convergence rates (up to log factors) which are independent of the input dimension of data. This result explains why DNN classifiers can perform well in practical high-dimensional classification problems. Besides the novel oracle-type inequality, the sharp convergence rates given in our paper also owe to a tight error bound for approximating the natural logarithm function near zero (where it is unbounded) by ReLU DNNs. In addition, we justify our claims for the optimality of rates by proving corresponding minimax lower bounds. All these results are new in the literature and will deepen our theoretical understanding of classification with DNNs.

Paper Structure (6 sections, 2 theorems, 67 equations, 1 figure)

This paper contains 6 sections, 2 theorems, 67 equations, 1 figure.

Introduction
Conventions and Notations
Spaces of Fully Connected Neural Networks
Glossary
Main Results
Main Upper Bounds

Key Result

Theorem 2.1

Let $\{(X_i,Y_i)\}_{i=1}^n$ be an i.i.d. sample of a probability distribution $P$ on $[0,1]^d\times\{-1,1\}$, $\mathcal{F}$ be a nonempty class of uniformly bounded real-valued functions defined on $[0,1]^d$, and $\hat{f}_n$ be an ERM with respect to the logistic loss $\phi(t)=\log(1+\mathrm{e}^{-t} If there exists a measurable function $\psi:[0,1]^d\times\{-1,1\}\to\mathbb{R}$ and a constant trip

Figures (1)

Figure :

Theorems & Definitions (2)

Theorem 2.1
Theorem 2.2

Classification with Deep Neural Networks and Logistic Loss

TL;DR

Abstract

Classification with Deep Neural Networks and Logistic Loss

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (2)