Table of Contents
Fetching ...

CRTRE: Causal Rule Generation with Target Trial Emulation Framework

Junda Wang, Weijian Li, Han Wang, Hanjia Lyu, Caroline P. Thirukumaran, Addisu Mesfin, Hong Yu, Jiebo Luo

TL;DR

A novel method called causal rule generation with target trial emulation framework (CRT RE), which applies randomize trial design principles to estimate the causal effect of association rules for the downstream applications such as prediction of disease onsets, is introduced.

Abstract

Causal inference and model interpretability are gaining increasing attention, particularly in the biomedical domain. Despite recent advance, decorrelating features in nonlinear environments with human-interpretable representations remains underexplored. In this study, we introduce a novel method called causal rule generation with target trial emulation framework (CRTRE), which applies randomize trial design principles to estimate the causal effect of association rules. We then incorporate such association rules for the downstream applications such as prediction of disease onsets. Extensive experiments on six healthcare datasets, including synthetic data, real-world disease collections, and MIMIC-III/IV, demonstrate the model's superior performance. Specifically, our method achieved a $β$ error of 0.907, outperforming DWR (1.024) and SVM (1.141). On real-world datasets, our model achieved accuracies of 0.789, 0.920, and 0.300 for Esophageal Cancer, Heart Disease, and Cauda Equina Syndrome prediction task, respectively, consistently surpassing baseline models. On the ICD code prediction tasks, it achieved AUC Macro scores of 92.8 on MIMIC-III and 96.7 on MIMIC-IV, outperforming the state-of-the-art models KEPT and MSMN. Expert evaluations further validate the model's effectiveness, causality, and interpretability.

CRTRE: Causal Rule Generation with Target Trial Emulation Framework

TL;DR

A novel method called causal rule generation with target trial emulation framework (CRT RE), which applies randomize trial design principles to estimate the causal effect of association rules for the downstream applications such as prediction of disease onsets, is introduced.

Abstract

Causal inference and model interpretability are gaining increasing attention, particularly in the biomedical domain. Despite recent advance, decorrelating features in nonlinear environments with human-interpretable representations remains underexplored. In this study, we introduce a novel method called causal rule generation with target trial emulation framework (CRTRE), which applies randomize trial design principles to estimate the causal effect of association rules. We then incorporate such association rules for the downstream applications such as prediction of disease onsets. Extensive experiments on six healthcare datasets, including synthetic data, real-world disease collections, and MIMIC-III/IV, demonstrate the model's superior performance. Specifically, our method achieved a error of 0.907, outperforming DWR (1.024) and SVM (1.141). On real-world datasets, our model achieved accuracies of 0.789, 0.920, and 0.300 for Esophageal Cancer, Heart Disease, and Cauda Equina Syndrome prediction task, respectively, consistently surpassing baseline models. On the ICD code prediction tasks, it achieved AUC Macro scores of 92.8 on MIMIC-III and 96.7 on MIMIC-IV, outperforming the state-of-the-art models KEPT and MSMN. Expert evaluations further validate the model's effectiveness, causality, and interpretability.

Paper Structure

This paper contains 23 sections, 2 theorems, 29 equations, 3 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

If the number of features in the datasets and the terms in the Taylor expansion are fixed, when $n \to \infty$ there exists $W \succeq 0$(the proof is shown in appendix) such that

Figures (3)

  • Figure 1: We first extracted the features from clinical notes in tabular form, filtering out irrelevant attributes to focus on the most pertinent variables. Using the Apriori algorithm, we then generated association rules from the dataset, identifying significant associations between medical conditions, symptoms, diseases and treatments. After generating the initial set of rules, we pruned redundant or irrelevant ones to ensure relevance and quality. Finally, we applied our novel regularizer to score each rule, assessing its clinical relevance and statistical significance. This combination of the Apriori algorithm and our regularizer produced a concise, meaningful set of association rules, offering valuable insights for clinical decision-making.
  • Figure 2: Figures describe the $\beta_{S}$, $\beta_{V}$ and RMSE with various environments.
  • Figure 3: Figures (a)-(d) describe the distribution of the Pearson Coefficient values among various relationships. Figure (a) reports the $\beta$ errors of different models. Figure (f) is under a linear environment and other figures are under nonlinear environments. Our model is able to provide the greatest reduction of both linear and nonlinear relationships.

Theorems & Definitions (3)

  • Lemma 1
  • Lemma 2
  • proof