Table of Contents
Fetching ...

Nonparametric Least Squares Estimators for Interval Censoring

Piet Groeneboom

TL;DR

The paper addresses the open problem of limit distributions for the nonparametric MLE in interval censoring with multiple observation times per subject (case 2, non-separated). It introduces and analyzes two isotonic nonparametric least-squares estimators, proves consistency, and derives a Brownian-motion–drift type limit for the main LS estimator, together with a parallel limit for a simpler one-step LS variant; it uses smooth functional theory to study asymptotic behavior of smooth functionals and quantitatively compares LS estimators to the MLE via simulations. The findings show a $n^{1/3}$ convergence rate at a fixed point with a specific Brownian-minimizer limit, and while the MLE’s conjectured faster rate remains unobserved for moderate samples, the LS estimator often exhibits smaller pointwise variance in practice; the work provides practical computational methods (iterative convex minorant) and a rigorous asymptotic framework for interval-censoring problems. Overall, the results offer a consistent, computable alternative to the MLE for interval censoring in the non-separated regime and lay out a detailed smooth-functional approach to their asymptotics, including a complete treatment for a simpler LS variant.

Abstract

The limit distribution of the nonparametric maximum likelihood estimator for interval censored data with more than one observation time per unobservable observation, is still unknown in general. For the so-called separated case, where one has observation times which are at a distance larger than a fixed positive epsilon, the limit distribution was derived in [5]. For the non-separated case there is a conjectured limit distribution, given in [10], Section 5.2 of Part 2. Whether this conjecture holds is still unknown, but the present paper shows that for sample sizes 1000 and 10,000 this limit behavior is still not clearly seen. We prove consistency of a related nonparametric isotonic least squares estimator and sketch of the proof for its limit distribution. We also provide simulation results to show how the nonparametric MLE and least squares estimator behave in comparison. Moreover, we discuss a simpler least squares estimator that can be computed in one step, but is inferior to the other least squares estimator, since it does not use all information. For the simplest model of interval censoring, the current status model, the nonparametric maximum likelihood and least squares estimators are the same. This equivalence breaks down if there are more observation times per unobservable observation. The computations for the simulation of the more complicated interval censoring model were performed by using the iterative convex minorant algorithm. They are provided in the GitHub repository [7].

Nonparametric Least Squares Estimators for Interval Censoring

TL;DR

The paper addresses the open problem of limit distributions for the nonparametric MLE in interval censoring with multiple observation times per subject (case 2, non-separated). It introduces and analyzes two isotonic nonparametric least-squares estimators, proves consistency, and derives a Brownian-motion–drift type limit for the main LS estimator, together with a parallel limit for a simpler one-step LS variant; it uses smooth functional theory to study asymptotic behavior of smooth functionals and quantitatively compares LS estimators to the MLE via simulations. The findings show a convergence rate at a fixed point with a specific Brownian-minimizer limit, and while the MLE’s conjectured faster rate remains unobserved for moderate samples, the LS estimator often exhibits smaller pointwise variance in practice; the work provides practical computational methods (iterative convex minorant) and a rigorous asymptotic framework for interval-censoring problems. Overall, the results offer a consistent, computable alternative to the MLE for interval censoring in the non-separated regime and lay out a detailed smooth-functional approach to their asymptotics, including a complete treatment for a simpler LS variant.

Abstract

The limit distribution of the nonparametric maximum likelihood estimator for interval censored data with more than one observation time per unobservable observation, is still unknown in general. For the so-called separated case, where one has observation times which are at a distance larger than a fixed positive epsilon, the limit distribution was derived in [5]. For the non-separated case there is a conjectured limit distribution, given in [10], Section 5.2 of Part 2. Whether this conjecture holds is still unknown, but the present paper shows that for sample sizes 1000 and 10,000 this limit behavior is still not clearly seen. We prove consistency of a related nonparametric isotonic least squares estimator and sketch of the proof for its limit distribution. We also provide simulation results to show how the nonparametric MLE and least squares estimator behave in comparison. Moreover, we discuss a simpler least squares estimator that can be computed in one step, but is inferior to the other least squares estimator, since it does not use all information. For the simplest model of interval censoring, the current status model, the nonparametric maximum likelihood and least squares estimators are the same. This equivalence breaks down if there are more observation times per unobservable observation. The computations for the simulation of the more complicated interval censoring model were performed by using the iterative convex minorant algorithm. They are provided in the GitHub repository [7].

Paper Structure

This paper contains 9 sections, 8 theorems, 113 equations, 8 figures.

Key Result

Lemma 1

[Characterization of the nonparametric ML estimator in the current status model] Consider the cumulative sum (cusum) diagram consisting of the points $P_0=(0,0)$ and where the $\Delta_i$'s correspond to the $T_i$'s, which are supposed to be ordered $0<T_1\dots< T_n$ (one can also allow ties, but we disregard this further complication here). Then the nonparametric MLE $\hat{F}_n(T_i)$ is given by

Figures (8)

  • Figure 1: (a) Nonparametric MLE (blue) of $F_0$ for a sample of size $n=1000$, (b) Nonparametric least squares estimate (blue) minimizing (\ref{['LS_criterion_IC']}) of $F_0$ for the same sample. The solid black curve shows $F_0$.
  • Figure 2: (a) Simulated variances, times $n^{2/3}$, of the nonparametric MLE (black solid curve) and the least squares estimate minimizing (\ref{['LS_criterion_IC']}) (red)), for $t_i=0.1,0.2,\dots,1.9$, linearly interpolated between values at the $t_i$ for the model of Example \ref{['example1']}. The blue dashed curve is the theoretical limit curve one obtains from Theorem \ref{['th:limit_LS']} in Section \ref{['sec:LS']} for the LS estimator. The simulated variances are based on $10,000$ simulations of samples of size $n=1000$ for the truncated exponential distribution function $F_0$ on $[0,2]$ and the order statistics of the uniform distribution on $[0,2]^2$ as observation times. (b) The same comparison, but now for $F_0$ uniform on $[0,2]$.
  • Figure 3: (a) Simulated variances, times $n^{2/3}$, of the simple nonparametric LS estimator, minimizing (\ref{['LS_criterion_IC2a']}) (black solid curve) and the least squares estimator, minimizing (\ref{['LS_criterion_IC']}) (red), for $t_i=0.1,0.2,\dots,1.9$, linearly interpolated between values at the $t_i$ for the model of Example \ref{['example1']}. The blue dashed curve and purple dotted curves are the theoretical limit curves discussed in Section \ref{['sec:LS']} for the LS estimators, minimizing (\ref{['LS_criterion_IC2a']}) and (\ref{['LS_criterion_IC']}), respectively The simulated variances are based on $10,000$ simulations of samples of size $n=10,000$ for the truncated exponential distribution function $F_0$ on $[0,2]$ and the order statistics of the uniform distribution on $[0,2]^2$ as observation times. (b) The same comparison, but now for $F_0$ uniform on $[0,2]$.
  • Figure 4: The process $W_{n,\hat{F}_n}$ as a function of the $2n$ ordered observations $U_i$ and $V_i$ for Example \ref{['example1']} and $n=100$. For this example $\lambda_{1,\hat{F}_n}=0.003148$ and $\lambda_{2,\hat{F}_n}=0.014758$.
  • Figure 5: (a) Simulated variances, times $n^{2/3}$, of the nonparametric MLE (black solid curve) and the least squares estimate (red), minimizing (\ref{['LS_criterion_IC']}), for $t_i=0.1,0.2,\dots,1.9$, linearly interpolated between values at the $t_i$ for the model of Example \ref{['example1']}. The blue dashed curve is the theoretical limit curve one obtains from Theorem \ref{['th:limit_LS']} below. The simulated variances are based on $10,000$ simulations of samples of size $n=10,000$ for the truncated exponential distribution function $F_0$ on $[0,2]$ and the order statistics of the uniform distribution on $[0,2]^2$ as observation times. (b) The same comparison, but now for $F_0$ uniform on $[0,2]$.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Lemma 1
  • Remark 1
  • Example 1
  • Lemma 2
  • Theorem 1
  • Remark 2
  • Lemma 3
  • proof
  • Lemma 4
  • Theorem 2
  • ...and 7 more