Table of Contents
Fetching ...

Nonparametric hazard rate estimation with associated kernels and minimax bandwidth choice

Luce Breuil, Sarah Kaakaï

TL;DR

The paper develops a unified nonparametric hazard-rate estimation framework using associated kernels whose shapes depend on the estimation point, providing a second-order MISE expansion and an asymptotic normality result. It extends the Goldenshluger–Lepski oracle inequality to both pointwise and global minimax bandwidth selection for hazard-rate estimation, addressing challenges from unbounded kernel supports. The Gamma kernel is shown to satisfy the proposed assumptions, with explicit results for MISE and asymptotic distribution, and numerical experiments on simulated and drosophila aging data illustrate boundary-bias improvements and the practical value of minimax bandwidths. Overall, the work offers theoretical guarantees and practical tools for hazard-rate estimation with flexible, boundary-aware kernels and data-driven bandwidth selection, paving the way for designing new associated kernels in survival analysis.

Abstract

In this paper, we introduce a general theoretical framework for nonparametric hazard rate estimation using associated kernels, whose shapes depend on the point of estimation. Within this framework, we establish rigorous asymptotic results, including a second-order expansion of the MISE, and a central limit theorem for the proposed estimator. We also prove a new oracle-type inequality for both local and global minimax bandwidth selection, extending the Goldenshluger-Lepski method to the context of associated kernels. Our results propose a systematic way to construct and analyze new associated kernels. Finally, we show that the general framework applies to the Gamma kernel, and we provide several examples of applications on simulated data and experimental data for the study of aging.

Nonparametric hazard rate estimation with associated kernels and minimax bandwidth choice

TL;DR

The paper develops a unified nonparametric hazard-rate estimation framework using associated kernels whose shapes depend on the estimation point, providing a second-order MISE expansion and an asymptotic normality result. It extends the Goldenshluger–Lepski oracle inequality to both pointwise and global minimax bandwidth selection for hazard-rate estimation, addressing challenges from unbounded kernel supports. The Gamma kernel is shown to satisfy the proposed assumptions, with explicit results for MISE and asymptotic distribution, and numerical experiments on simulated and drosophila aging data illustrate boundary-bias improvements and the practical value of minimax bandwidths. Overall, the work offers theoretical guarantees and practical tools for hazard-rate estimation with flexible, boundary-aware kernels and data-driven bandwidth selection, paving the way for designing new associated kernels in survival analysis.

Abstract

In this paper, we introduce a general theoretical framework for nonparametric hazard rate estimation using associated kernels, whose shapes depend on the point of estimation. Within this framework, we establish rigorous asymptotic results, including a second-order expansion of the MISE, and a central limit theorem for the proposed estimator. We also prove a new oracle-type inequality for both local and global minimax bandwidth selection, extending the Goldenshluger-Lepski method to the context of associated kernels. Our results propose a systematic way to construct and analyze new associated kernels. Finally, we show that the general framework applies to the Gamma kernel, and we provide several examples of applications on simulated data and experimental data for the study of aging.

Paper Structure

This paper contains 39 sections, 20 theorems, 210 equations, 6 figures, 2 tables.

Key Result

Proposition 2.1

The Gamma kernel as defined by Definition def:gamma verifies Definition def:cont_ass with $\mathbb{S} = \mathbb{R}_+$, and Assumptions assump:gamma_1 to assump:gamma_2_inf with $\gamma = 1/2$.

Figures (6)

  • Figure 1: Comparison of the kernel estimation on two hazard rates. Estimation methods are Gamma (Gam), Gaussian with cross-validation bandwidth (Gaus), $50$ nearest neighbor bandwidth Gaussian kernel (NNG) and log-normal ratio (LN) for the specified values of bandwidth and a sample size of $2000$.
  • Figure 2: Local minimax bandwidth estimator and close-up of the chosen bandwidth for $m=2000$, on two hazard rates of the form $k(t) = a + f_1(t) + f_2(t)$ with $a = 7\cdot 10^{-3}$ and $f_1$ and $f_2$ Gaussian densities of same sd $15$ (left plot) and $5$ (right plot), centered around $0$ and $150$ respectively.
  • Figure 3: Schematic representation of the two-phased model.
  • Figure 4: Kernel estimator of the death hazard rate estimation in smurf flies with the Gamma kernel with the minimax bandwidth procedure and the Gaussian kernel with cross-validation bandwidth.
  • Figure 5: Comparison of bandwidth choice methods on a hazard rate $k(t) = a+c\cdot e^{-dt}$, $a = 7\cdot 10^{-3}, c = 3 \cdot 10^{-2}, d = 7 \cdot 10^{-2}$.
  • ...and 1 more figures

Theorems & Definitions (39)

  • Definition 2.1: Associated kernel
  • Remark 2.1
  • Definition 2.2: Gamma kernel without interior bias
  • Remark 2.2
  • Proposition 2.1
  • Lemma 2.1
  • proof
  • Proposition 3.1
  • Proposition 3.2: Bias
  • Proposition 3.3: Variance and consistency
  • ...and 29 more