Table of Contents
Fetching ...

Nonparametric logistic regression with deep learning

Atsutomo Yara, Yoshikazu Terada

TL;DR

This paper addresses nonparametric logistic regression where KL-based excess risk can diverge, and develops a direct analysis using the squared Hellinger distance for the nonparametric MLE. By leveraging the framework of vandeGeer, it proves an oracle inequality that relates estimation error to a benchmark approximation and a local complexity term, enabling convergence rate results under mild conditions. The authors specialize the theory to dense deep neural networks, showing that with composition-structured true probabilities, NPMLEs based on deep nets achieve near-minimax convergence rates that can be independent of input dimension under suitable smoothness. These results provide rigorous, practical guarantees for probability estimation with deep learning in high-dimensional, nonparametric settings, and illuminate when dense DNNs can achieve optimal rates without sparsity assumptions.

Abstract

Consider the nonparametric logistic regression problem. In the logistic regression, we usually consider the maximum likelihood estimator, and the excess risk is the expectation of the Kullback-Leibler (KL) divergence between the true and estimated conditional class probabilities. However, in the nonparametric logistic regression, the KL divergence could diverge easily, and thus, the convergence of the excess risk is difficult to prove or does not hold. Several existing studies show the convergence of the KL divergence under strong assumptions. In most cases, our goal is to estimate the true conditional class probabilities. Thus, instead of analyzing the excess risk itself, it suffices to show the consistency of the maximum likelihood estimator in some suitable metric. In this paper, using a simple unified approach for analyzing the nonparametric maximum likelihood estimator (NPMLE), we directly derive convergence rates of the NPMLE in the Hellinger distance under mild assumptions. Although our results are similar to the results in some existing studies, we provide simple and more direct proofs for these results. As an important application, we derive convergence rates of the NPMLE with fully connected deep neural networks and show that the derived rate nearly achieves the minimax optimal rate.

Nonparametric logistic regression with deep learning

TL;DR

This paper addresses nonparametric logistic regression where KL-based excess risk can diverge, and develops a direct analysis using the squared Hellinger distance for the nonparametric MLE. By leveraging the framework of vandeGeer, it proves an oracle inequality that relates estimation error to a benchmark approximation and a local complexity term, enabling convergence rate results under mild conditions. The authors specialize the theory to dense deep neural networks, showing that with composition-structured true probabilities, NPMLEs based on deep nets achieve near-minimax convergence rates that can be independent of input dimension under suitable smoothness. These results provide rigorous, practical guarantees for probability estimation with deep learning in high-dimensional, nonparametric settings, and illuminate when dense DNNs can achieve optimal rates without sparsity assumptions.

Abstract

Consider the nonparametric logistic regression problem. In the logistic regression, we usually consider the maximum likelihood estimator, and the excess risk is the expectation of the Kullback-Leibler (KL) divergence between the true and estimated conditional class probabilities. However, in the nonparametric logistic regression, the KL divergence could diverge easily, and thus, the convergence of the excess risk is difficult to prove or does not hold. Several existing studies show the convergence of the KL divergence under strong assumptions. In most cases, our goal is to estimate the true conditional class probabilities. Thus, instead of analyzing the excess risk itself, it suffices to show the consistency of the maximum likelihood estimator in some suitable metric. In this paper, using a simple unified approach for analyzing the nonparametric maximum likelihood estimator (NPMLE), we directly derive convergence rates of the NPMLE in the Hellinger distance under mild assumptions. Although our results are similar to the results in some existing studies, we provide simple and more direct proofs for these results. As an important application, we derive convergence rates of the NPMLE with fully connected deep neural networks and show that the derived rate nearly achieves the minimax optimal rate.
Paper Structure (16 sections, 20 theorems, 152 equations, 1 table)

This paper contains 16 sections, 20 theorems, 152 equations, 1 table.

Key Result

Theorem 3.1

Consider the K-class classification model eq:data-generating model. Let $\hat{\bm{p}}_n$ be given in eq:mle. Suppose that Assumption assumption is satisfied. Take $\Psi(\delta) \geq J_B(\delta, \Bar{\mathcal{F}}_n^{1/2}(\Tilde{\bm{p}}_n, \delta), \mu)$ in such a way that $\Psi(\delta)/\delta^2$ is a we have for all $\delta \geq \delta_n$,

Theorems & Definitions (47)

  • Definition 2.1: Small Value Bound, Definition 3.1 of BosSchmidt-Hieber22
  • Definition 3.1: Bracketing Number for $L^p(Q)$ Metric
  • Theorem 3.1: Oracle inequality
  • Corollary 3.2
  • Remark 3.1
  • Remark 3.2
  • Remark 3.3
  • Theorem 4.1: Convergence rates
  • Remark 4.1
  • Remark 4.2
  • ...and 37 more