Table of Contents
Fetching ...

Operationalizing Longitudinal Causal Discovery Under Real-World Workflow Constraints

Tadahisa Okuda, Shohei Shimizu, Thong Pham, Tatsuyoshi Ikenoue, Shingo Fukuma

TL;DR

It is argued that formalizing workflow-derived constraint classes improves structural interpretability without relying on domain-specific edge specification, providing a reproducible bridge between operational workflows and longitudinal causal discovery under standard identifiability assumptions.

Abstract

Causal discovery has achieved substantial theoretical progress, yet its deployment in large-scale longitudinal systems remains limited. A key obstacle is that operational data are generated under institutional workflows whose induced partial orders are rarely formalized, enlarging the admissible graph space in ways inconsistent with the recording process. We characterize a workflow-induced constraint class for longitudinal causal discovery that restricts the admissible directed acyclic graph space through protocol-derived structural masks and timeline-aligned indexing. Rather than introducing a new optimization algorithm, we show that explicitly encoding workflow-consistent partial orders reduces structural ambiguity, especially in mixed discrete--continuous panels where within-time orientation is weakly identified. The framework combines workflow-derived admissible-edge constraints, measurement-aligned time indexing and block structure, bootstrap-based uncertainty quantification for lagged total effects, and a dynamic representation supporting intervention queries. In a nationwide annual health screening cohort in Japan with 107,261 individuals and 429,044 person-years, workflow-constrained longitudinal LiNGAM yields temporally consistent within-time substructures and interpretable lagged total effects with explicit uncertainty. Sensitivity analyses using alternative exposure and body-composition definitions preserve the main qualitative patterns. We argue that formalizing workflow-derived constraint classes improves structural interpretability without relying on domain-specific edge specification, providing a reproducible bridge between operational workflows and longitudinal causal discovery under standard identifiability assumptions.

Operationalizing Longitudinal Causal Discovery Under Real-World Workflow Constraints

TL;DR

It is argued that formalizing workflow-derived constraint classes improves structural interpretability without relying on domain-specific edge specification, providing a reproducible bridge between operational workflows and longitudinal causal discovery under standard identifiability assumptions.

Abstract

Causal discovery has achieved substantial theoretical progress, yet its deployment in large-scale longitudinal systems remains limited. A key obstacle is that operational data are generated under institutional workflows whose induced partial orders are rarely formalized, enlarging the admissible graph space in ways inconsistent with the recording process. We characterize a workflow-induced constraint class for longitudinal causal discovery that restricts the admissible directed acyclic graph space through protocol-derived structural masks and timeline-aligned indexing. Rather than introducing a new optimization algorithm, we show that explicitly encoding workflow-consistent partial orders reduces structural ambiguity, especially in mixed discrete--continuous panels where within-time orientation is weakly identified. The framework combines workflow-derived admissible-edge constraints, measurement-aligned time indexing and block structure, bootstrap-based uncertainty quantification for lagged total effects, and a dynamic representation supporting intervention queries. In a nationwide annual health screening cohort in Japan with 107,261 individuals and 429,044 person-years, workflow-constrained longitudinal LiNGAM yields temporally consistent within-time substructures and interpretable lagged total effects with explicit uncertainty. Sensitivity analyses using alternative exposure and body-composition definitions preserve the main qualitative patterns. We argue that formalizing workflow-derived constraint classes improves structural interpretability without relying on domain-specific edge specification, providing a reproducible bridge between operational workflows and longitudinal causal discovery under standard identifiability assumptions.
Paper Structure (68 sections, 1 equation, 15 figures, 7 tables, 1 algorithm)

This paper contains 68 sections, 1 equation, 15 figures, 7 tables, 1 algorithm.

Figures (15)

  • Figure 1: Bootstrap distributions of lagged total effects from health guidance in 2020 to outcomes measured in 2021--2023 (lags 0--2), under the workflow-derived constraints ($B=1000$). Panels are arranged in a $3\times 5$ grid: rows correspond to lag and columns to outcomes (BMI, SBP, DBP, HbA1c, LDL). Dashed vertical lines indicate the 95% bootstrap percentile interval, and solid vertical lines mark zero.
  • Figure 2: Compact recurring subgraph (motif) summarizing within-time relations among the five continuous health screening outcomes. Directed edges indicate directions that are consistent across time points 1--3 under the workflow-derived constraints. The undirected SBP--DBP link indicates a recurring adjacency whose direction varies across time points. The motif is a descriptive summary of recurrent within-time structure, not the full longitudinal graph.
  • Figure A.1: Variable set and time-point alignment used in this study. Each time point contains 15 variables: health guidance, five continuous health screening outcomes (BMI, SBP, DBP, HbA1c, LDL), three medication indicators, three lifestyle indicators, demographics (Age, Sex), and Check_num. (attendance history in the three years preceding the measurement year). Cells marked with $\dagger$ are shown only to illustrate the fixed tensor layout across time points and are excluded from model fitting.
  • Figure A.2: Schematic of the health guidance assignment logic used to construct the assignment indicator for sensitivity analysis. The selection proceeds from waist circumference screening to assessment of risk factor status, with medication-treated individuals excluded from eligibility (details are simplified for exposition).
  • Figure B.1: Conceptual diagram of the workflow-derived prior knowledge. The prior knowledge encodes (i) time ordering across annual visits, (ii) within-time block ordering consistent with the recording protocol, and (iii) admissible cross-time links restricted to one-year lag ($t\!-\!1\to t$). The prior knowledge is derived from generic invariances and recording-protocol constraints, not from medical causal assumptions.
  • ...and 10 more figures