Table of Contents
Fetching ...

Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis)

Alexander Mangulad Christgau

TL;DR

The thesis advances model-free, estimand-driven inference for event-history data by introducing the Local Covariance Measure (LCM) to test conditional local independence without specifying parametric models, and the Aalen Covariance Measure (ACM) to quantify conditional associations in time-to-event settings. It develops a Douc-type estimation framework using double machine learning and cross-fitting, establishes sqrt(n)-consistency and uniform asymptotic results for LCM estimators, and constructs a Uniform Local Covariance Test (LCT) with controlled type I error and power under local alternatives. A separate covariate-adjustment framework yields the DOPE estimator for efficient treatment-effect estimation, including neural-network implementations and empirical demonstrations. Together with a detailed treatment of nuisance estimation, cross-fitting alternatives, and a rich set of proofs and simulations, the work provides model-free tools for robust inference in survival and counting-process contexts, with practical implications for causal and predictive analyses in epidemiology and related fields.

Abstract

This thesis contains a series of independent contributions to statistics, unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. Mathematical insights are obtained from concrete examples, and these insights are generalized to principles that permeate the rest of the thesis. The second chapter studies the concept of local independence, which describes whether the evolution of one stochastic process is directly influenced by another. To test local independence, we define a model-free parameter called the Local Covariance Measure (LCM). We formulate an estimator for the LCM, from which a test of local independence is proposed. We discuss how the size and power of the proposed test can be controlled uniformly and investigate the test in a simulation study. The third chapter focuses on covariate adjustment, a method used to estimate the effect of a treatment by accounting for observed confounding. We formulate a general framework that facilitates adjustment for any subset of covariate information. We identify the optimal covariate information for adjustment and, based on this, introduce the Debiased Outcome-adapted Propensity Estimator (DOPE) for efficient estimation of treatment effects. An instance of DOPE is implemented using neural networks, and we demonstrate its performance on simulated and real data. The fourth and final chapter introduces a model-free measure of the conditional association between an exposure and a time-to-event, which we call the Aalen Covariance Measure (ACM). We develop a model-free estimation method and show that it is doubly robust, ensuring $\sqrt{n}$-consistency provided that the nuisance functions can be estimated with modest rates. A simulation study demonstrates the use of our estimator in several settings.

Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis)

TL;DR

The thesis advances model-free, estimand-driven inference for event-history data by introducing the Local Covariance Measure (LCM) to test conditional local independence without specifying parametric models, and the Aalen Covariance Measure (ACM) to quantify conditional associations in time-to-event settings. It develops a Douc-type estimation framework using double machine learning and cross-fitting, establishes sqrt(n)-consistency and uniform asymptotic results for LCM estimators, and constructs a Uniform Local Covariance Test (LCT) with controlled type I error and power under local alternatives. A separate covariate-adjustment framework yields the DOPE estimator for efficient treatment-effect estimation, including neural-network implementations and empirical demonstrations. Together with a detailed treatment of nuisance estimation, cross-fitting alternatives, and a rich set of proofs and simulations, the work provides model-free tools for robust inference in survival and counting-process contexts, with practical implications for causal and predictive analyses in epidemiology and related fields.

Abstract

This thesis contains a series of independent contributions to statistics, unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. Mathematical insights are obtained from concrete examples, and these insights are generalized to principles that permeate the rest of the thesis. The second chapter studies the concept of local independence, which describes whether the evolution of one stochastic process is directly influenced by another. To test local independence, we define a model-free parameter called the Local Covariance Measure (LCM). We formulate an estimator for the LCM, from which a test of local independence is proposed. We discuss how the size and power of the proposed test can be controlled uniformly and investigate the test in a simulation study. The third chapter focuses on covariate adjustment, a method used to estimate the effect of a treatment by accounting for observed confounding. We formulate a general framework that facilitates adjustment for any subset of covariate information. We identify the optimal covariate information for adjustment and, based on this, introduce the Debiased Outcome-adapted Propensity Estimator (DOPE) for efficient estimation of treatment effects. An instance of DOPE is implemented using neural networks, and we demonstrate its performance on simulated and real data. The fourth and final chapter introduces a model-free measure of the conditional association between an exposure and a time-to-event, which we call the Aalen Covariance Measure (ACM). We develop a model-free estimation method and show that it is doubly robust, ensuring -consistency provided that the nuisance functions can be estimated with modest rates. A simulation study demonstrates the use of our estimator in several settings.

Paper Structure

This paper contains 103 sections, 69 theorems, 530 equations, 20 figures, 3 tables, 6 algorithms.

Key Result

Lemma 2.2.1

Let $\check{\tau}$ and $(\widehat{\tau}_k)_{k\in[K]}$ be the estimators defined in eq:DML and let $\sigma_P^2>0$. Suppose that each $k\in [K]$, where $U_k$ and $R_k$ are variables satisfying that $U_k \xrightarrow{d} \mathrm{N}(0,\sigma_P^2)$ and $R_k \xrightarrow{P} 0$ as $n\to \infty$. Assume also the joint independence statement $U_1 \,\perp \! \! \! \perp\, \cdots \,\perp \! \! \! \perp\, U_K

Figures (20)

  • Figure 3.1.1: Local independence graph illustrating a dependence structure among the three processes $X$, $Z$ and $N$. Here $N$ is the indicator of death for an individual, $X$ is their cumulative pension savings and $Z$ is a covariate process. All nodes in this graph have implicit self-loops. There is no edge from $X$ to $N$, which indicates that death is not directly influenced by pension savings. This can be formalized as $N$ being conditionally locally independent of $X$, which is the hypothesis we aim to test.
  • Figure 3.2.2: Local independence graphs illustrating how the three processes $X$, $Y$, and $Z$ could affect each other and time of death in the Cox example. There is no direct influence of $X$ (pension savings) on time of death in either of the two graphs, but in the left graph the death indicator is furthermore conditionally locally independent of $X$ given the history of $Z$ and $N$. In the right graph, $Z$ and $N$ does not block all paths from $X$ to $N$, thus conditioning on the history of $Z$ and $N$ only would not render $N$ conditionally locally independent of $X$.
  • Figure 3.2.3: Histograms of the distributions of three different estimators of $\gamma_1$. Each histogram contains 1000 estimates fitted to samples of size $n=500$. The samples were sampled from a model that satisfies the hypothesis of conditional local independence and hence the ground truth is $\gamma_1=0$. See Section \ref{['sec:SamplingScheme']} for further details of the data generating process.
  • Figure 3.2.4: A time dependent extension of Figure \ref{['fig:endpoint_example']} showing the distribution of the sample paths $t \mapsto \widehat{\gamma}_{t, \mathrm{plug-in}}^{(500)}$ and $t \mapsto \widehat{\gamma}_{t, \mathrm{double}}^{(500)}$, the latter with and without using cross-fitting. The data were simulated under $H_0$ where $t\mapsto \gamma_t$ is the zero function. See Section \ref{['sec:SamplingScheme']} for further details of the data generating process.
  • Figure 3.6.5: Empirical cumulative distribution functions of simulated $p$-values for the cross-fitted local covariance test and the hazard ratio test. The simulated data satisfies the hypothesis of conditional local independence, so the $p$-values are supposed to be uniformly distributed, and the CDF should fall on the diagonal dotted line.
  • ...and 15 more figures

Theorems & Definitions (145)

  • Example 2.0.1
  • Lemma 2.2.1: Asymptotics of dml1 estimator
  • proof
  • Definition 3.2.1: Conditional local independence
  • Remark 3.2.2: Censoring
  • Definition 3.2.3: Residual Process
  • Definition 3.2.4: Local Covariance Measure
  • Proposition 3.2.5
  • Proposition 3.2.6
  • Proposition 3.4.1
  • ...and 135 more