Learning Survival Models with Right-Censored Reporting Delays
Yuta Shikuri, Hironori Fujisawa
TL;DR
This work tackles the challenge of learning survival models when reporting delays cause right-censoring of event observations, a common issue in insurance. It introduces a joint hazard framework for the time to accidents and their reporting delays, and proves estimator consistency by marginalizing over latent event statuses; an EM-based algorithm with Monte Carlo imputation makes the method practical. To handle administrative censoring in newly enrolled cohorts, the authors propose a two-stage estimation procedure that transfers information from a source domain without censoring, yielding consistent risk evaluations in the target domain. Empirical results in toy and real insurance datasets demonstrate improved timeliness and accuracy of risk assessment for recently enrolled cohorts. The approach offers a rigorous, transferable, and computationally feasible path to timely, individualized risk estimation under complex censoring scenarios.
Abstract
Survival analysis is a statistical technique used to estimate the time until an event occurs. Although it is applied across a wide range of fields, adjusting for reporting delays under practical constraints remains a significant challenge in the insurance industry. Such delays render event occurrences unobservable when their reports are subject to right censoring. This issue becomes particularly critical when estimating hazard rates for newly enrolled cohorts with limited follow-up due to administrative censoring. Our study addresses this challenge by jointly modeling the parametric hazard functions of event occurrences and report timings. The joint probability distribution is marginalized over the latent event occurrence status. We construct an estimator for the proposed survival model and establish its asymptotic consistency. Furthermore, we develop an expectation-maximization algorithm to compute its estimates. Using these findings, we propose a two-stage estimation procedure based on a parametric proportional hazards model to evaluate observations subject to administrative censoring. Experimental results demonstrate that our method effectively improves the timeliness of risk evaluation for newly enrolled cohorts.
