Inference in pseudo-observation-based regression using (biased) covariance estimation and naive bootstrapping
Simon Mack, Morten Overgaard, Dennis Dobler
TL;DR
This paper investigates inference for pseudo-observation regression in time-to-event settings by examining covariance estimation and resampling. It shows that the standard Huber-White variance estimator is generally biased for the limiting covariance, and proves consistency of a plug-in PV covariance estimator to enable valid Wald-type tests for general linear hypotheses. Although naive bootstrap fails for variance estimation in this framework, it can still support hypothesis testing with proper standardization, while the PV-based approach provides better type I error control and power, particularly in small samples or with censoring. The work also establishes a uniform law of large numbers for U- and V-statistics that underpins these results and demonstrates the findings through simulations and a real data analysis on 90-day survival in a lung cancer cohort. Overall, the recommended practice is to use the corrected PV estimator for covariance in pseudo-observation regression and to exercise caution with bootstrap-based variance estimation, especially for variance rather than hypothesis testing.
Abstract
We demonstrate that the usual Huber-White estimator is not consistent for the limiting covariance of parameter estimates in pseudo-observation regression approaches. By confirming that a plug-in estimator can be used instead, we obtain asymptotically exact and consistent tests for general linear hypotheses in the parameters of the model. Additionally, we confirm that naive bootstrapping can not be used for covariance estimation in the pseudo-observation model either. However, it can be used for hypothesis testing by applying a suitable studentization. Simulations illustrate the good performance of our proposed methods in many scenarios. Finally, we obtain a general uniform law of large numbers for U- and V-statistics, as such statistics are central in the mathematical analysis of the inference procedures developed in this work.
