Table of Contents
Fetching ...

Learning Survival Models with Right-Censored Reporting Delays

Yuta Shikuri, Hironori Fujisawa

TL;DR

This work tackles the challenge of learning survival models when reporting delays cause right-censoring of event observations, a common issue in insurance. It introduces a joint hazard framework for the time to accidents and their reporting delays, and proves estimator consistency by marginalizing over latent event statuses; an EM-based algorithm with Monte Carlo imputation makes the method practical. To handle administrative censoring in newly enrolled cohorts, the authors propose a two-stage estimation procedure that transfers information from a source domain without censoring, yielding consistent risk evaluations in the target domain. Empirical results in toy and real insurance datasets demonstrate improved timeliness and accuracy of risk assessment for recently enrolled cohorts. The approach offers a rigorous, transferable, and computationally feasible path to timely, individualized risk estimation under complex censoring scenarios.

Abstract

Survival analysis is a statistical technique used to estimate the time until an event occurs. Although it is applied across a wide range of fields, adjusting for reporting delays under practical constraints remains a significant challenge in the insurance industry. Such delays render event occurrences unobservable when their reports are subject to right censoring. This issue becomes particularly critical when estimating hazard rates for newly enrolled cohorts with limited follow-up due to administrative censoring. Our study addresses this challenge by jointly modeling the parametric hazard functions of event occurrences and report timings. The joint probability distribution is marginalized over the latent event occurrence status. We construct an estimator for the proposed survival model and establish its asymptotic consistency. Furthermore, we develop an expectation-maximization algorithm to compute its estimates. Using these findings, we propose a two-stage estimation procedure based on a parametric proportional hazards model to evaluate observations subject to administrative censoring. Experimental results demonstrate that our method effectively improves the timeliness of risk evaluation for newly enrolled cohorts.

Learning Survival Models with Right-Censored Reporting Delays

TL;DR

This work tackles the challenge of learning survival models when reporting delays cause right-censoring of event observations, a common issue in insurance. It introduces a joint hazard framework for the time to accidents and their reporting delays, and proves estimator consistency by marginalizing over latent event statuses; an EM-based algorithm with Monte Carlo imputation makes the method practical. To handle administrative censoring in newly enrolled cohorts, the authors propose a two-stage estimation procedure that transfers information from a source domain without censoring, yielding consistent risk evaluations in the target domain. Empirical results in toy and real insurance datasets demonstrate improved timeliness and accuracy of risk assessment for recently enrolled cohorts. The approach offers a rigorous, transferable, and computationally feasible path to timely, individualized risk estimation under complex censoring scenarios.

Abstract

Survival analysis is a statistical technique used to estimate the time until an event occurs. Although it is applied across a wide range of fields, adjusting for reporting delays under practical constraints remains a significant challenge in the insurance industry. Such delays render event occurrences unobservable when their reports are subject to right censoring. This issue becomes particularly critical when estimating hazard rates for newly enrolled cohorts with limited follow-up due to administrative censoring. Our study addresses this challenge by jointly modeling the parametric hazard functions of event occurrences and report timings. The joint probability distribution is marginalized over the latent event occurrence status. We construct an estimator for the proposed survival model and establish its asymptotic consistency. Furthermore, we develop an expectation-maximization algorithm to compute its estimates. Using these findings, we propose a two-stage estimation procedure based on a parametric proportional hazards model to evaluate observations subject to administrative censoring. Experimental results demonstrate that our method effectively improves the timeliness of risk evaluation for newly enrolled cohorts.

Paper Structure

This paper contains 33 sections, 13 theorems, 63 equations, 3 figures, 4 tables, 2 algorithms.

Key Result

Lemma 5.1

For $0 \leq y$ and $v = 0$, the joint distribution of $(Y, V)$ is where $S_{\circ}(y) \equiv S_1(y) + \int_0^y f_1(t) S_2(y - t) dt$. For $0 \leq z \leq y$ and $v = 1$, the joint distribution of $(Z, Y, V)$ is where $f_{\circ}(z, y) \equiv f_1(z) f_2(y - z)$.

Figures (3)

  • Figure 1: Overview of problem setting. There are two processes: one leading up to an accident and another from the accident to its corresponding report. Without a report, the accident information enclosed in parentheses is not observed. The main objective of this study is to estimate the accident hazard function for the timely risk evaluation of a newly enrolled cohort with homogeneous risk. Given the limited observation window for such cohorts, the parameter estimator in the survival model is required to achieve asymptotic consistency even under administrative censoring.
  • Figure 2: Overview of the two-stage estimation procedure. The target domain, representing a newly enrolled cohort, is subject to administrative censoring. Due to identifiability limitations inherent in survival models, the covariate effect in the target domain cannot be estimated without supplementary information. To address this challenge, we consider a setting in which supplementary data from a source domain unaffected by administrative censoring are available.
  • Figure 3: Notations for observations. The figure illustrates three possible patterns of accident and report status under a common right censoring time. In the top two patterns where $v_i = 0$, the accident status $(z_i, w_i)$ remains unobserved.

Theorems & Definitions (16)

  • Lemma 5.1
  • Theorem 5.3
  • Proposition 5.5
  • Lemma 5.6
  • Theorem 5.9
  • Proposition 5.11
  • Theorem 5.12
  • Theorem 5.13
  • Proposition 5.14
  • Theorem 5.16
  • ...and 6 more