Table of Contents
Fetching ...

Confidence Intervals and Simultaneous Confidence Bands Based on Deep Learning

Asaf Ben Arie, Malka Gorfine

TL;DR

A valid non-parametric bootstrap method that correctly disentangles data uncertainty from the noise inherent in the adopted optimization algorithm, ensuring that the resulting point-wise confidence intervals or the simultaneous confidence bands are accurate (i.e., valid and not overly conservative).

Abstract

Deep learning models have significantly improved prediction accuracy in various fields, gaining recognition across numerous disciplines. Yet, an aspect of deep learning that remains insufficiently addressed is the assessment of prediction uncertainty. Producing reliable uncertainty estimators could be crucial in practical terms. For instance, predictions associated with a high degree of uncertainty could be sent for further evaluation. Recent works in uncertainty quantification of deep learning predictions, including Bayesian posterior credible intervals and a frequentist confidence-interval estimation, have proven to yield either invalid or overly conservative intervals. Furthermore, there is currently no method for quantifying uncertainty that can accommodate deep neural networks for survival (time-to-event) data that involves right-censored outcomes. In this work, we provide a valid non-parametric bootstrap method that correctly disentangles data uncertainty from the noise inherent in the adopted optimization algorithm, ensuring that the resulting point-wise confidence intervals or the simultaneous confidence bands are accurate (i.e., valid and not overly conservative). The proposed ad-hoc method can be easily integrated into any deep neural network without interfering with the training process. The utility of the proposed approach is illustrated by constructing simultaneous confidence bands for survival curves derived from deep neural networks for survival data with right censoring.

Confidence Intervals and Simultaneous Confidence Bands Based on Deep Learning

TL;DR

A valid non-parametric bootstrap method that correctly disentangles data uncertainty from the noise inherent in the adopted optimization algorithm, ensuring that the resulting point-wise confidence intervals or the simultaneous confidence bands are accurate (i.e., valid and not overly conservative).

Abstract

Deep learning models have significantly improved prediction accuracy in various fields, gaining recognition across numerous disciplines. Yet, an aspect of deep learning that remains insufficiently addressed is the assessment of prediction uncertainty. Producing reliable uncertainty estimators could be crucial in practical terms. For instance, predictions associated with a high degree of uncertainty could be sent for further evaluation. Recent works in uncertainty quantification of deep learning predictions, including Bayesian posterior credible intervals and a frequentist confidence-interval estimation, have proven to yield either invalid or overly conservative intervals. Furthermore, there is currently no method for quantifying uncertainty that can accommodate deep neural networks for survival (time-to-event) data that involves right-censored outcomes. In this work, we provide a valid non-parametric bootstrap method that correctly disentangles data uncertainty from the noise inherent in the adopted optimization algorithm, ensuring that the resulting point-wise confidence intervals or the simultaneous confidence bands are accurate (i.e., valid and not overly conservative). The proposed ad-hoc method can be easily integrated into any deep neural network without interfering with the training process. The utility of the proposed approach is illustrated by constructing simultaneous confidence bands for survival curves derived from deep neural networks for survival data with right censoring.
Paper Structure (14 sections, 17 equations, 5 figures, 2 tables)

This paper contains 14 sections, 17 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Simulation results of empirical coverage rates, Settings 1--5 with $n=10,000$.
  • Figure 2: Simulation results of empirical width, Settings 1--5 with $n=10,000$.
  • Figure 3: Simulation results of empirical coverage rates, Settings 1--5 with varied size of training and validation dataset, $n$.
  • Figure 4: Simulation results of empirical mean width, Settings 1--5 with varied size of training and validation dataset, $n$.
  • Figure 5: Examples of survival curves and simultaneous confidence bands produced by the naive and ensemble-based bootstrap methods.