Table of Contents
Fetching ...

Fast Flow Matching based Conditional Independence Tests for Causal Discovery

Shunyu Zhao, Yanfeng Yang, Shuai Li, Kenji Fukumizu

TL;DR

This work tackles the computational bottleneck of constraint-based causal discovery by introducing FMCIT, a fast conditional independence test built on flow matching that learns the joint distribution once and uses Picard sampling for efficient conditional imputation. By integrating FMCIT into a two-stage, budgeted PC skeleton learning framework (GPC-FMCIT), the method reduces the total number of CI queries while maintaining statistical power. Empirical results across synthetic benchmarks, high-dimensional nonlinear SEMs, and real flow-cytometry data demonstrate favorable accuracy-efficiency trade-offs relative to existing CI tests and PC variants, with strong type-I error control in many settings. The approach offers a practical, scalable route for causal discovery in complex, nonlinear, and high-dimensional systems, with potential extensions to theoretical guarantees and double-robust improvements.

Abstract

Constraint-based causal discovery methods require a large number of conditional independence (CI) tests, which severely limits their practical applicability due to high computational complexity. Therefore, it is crucial to design an algorithm that accelerates each individual test. To this end, we propose the Flow Matching-based Conditional Independence Test (FMCIT). The proposed test leverages the high computational efficiency of flow matching and requires the model to be trained only once throughout the entire causal discovery procedure, substantially accelerating causal discovery. According to numerical experiments, FMCIT effectively controls type-I error and maintains high testing power under the alternative hypothesis, even in the presence of high-dimensional conditioning sets. In addition, we further integrate FMCIT into a two-stage guided PC skeleton learning framework, termed GPC-FMCIT, which combines fast screening with guided, budgeted refinement using FMCIT. This design yields explicit bounds on the number of CI queries while maintaining high statistical power. Experiments on synthetic and real-world causal discovery tasks demonstrate favorable accuracy-efficiency trade-offs over existing CI testing methods and PC variants.

Fast Flow Matching based Conditional Independence Tests for Causal Discovery

TL;DR

This work tackles the computational bottleneck of constraint-based causal discovery by introducing FMCIT, a fast conditional independence test built on flow matching that learns the joint distribution once and uses Picard sampling for efficient conditional imputation. By integrating FMCIT into a two-stage, budgeted PC skeleton learning framework (GPC-FMCIT), the method reduces the total number of CI queries while maintaining statistical power. Empirical results across synthetic benchmarks, high-dimensional nonlinear SEMs, and real flow-cytometry data demonstrate favorable accuracy-efficiency trade-offs relative to existing CI tests and PC variants, with strong type-I error control in many settings. The approach offers a practical, scalable route for causal discovery in complex, nonlinear, and high-dimensional systems, with potential extensions to theoretical guarantees and double-robust improvements.

Abstract

Constraint-based causal discovery methods require a large number of conditional independence (CI) tests, which severely limits their practical applicability due to high computational complexity. Therefore, it is crucial to design an algorithm that accelerates each individual test. To this end, we propose the Flow Matching-based Conditional Independence Test (FMCIT). The proposed test leverages the high computational efficiency of flow matching and requires the model to be trained only once throughout the entire causal discovery procedure, substantially accelerating causal discovery. According to numerical experiments, FMCIT effectively controls type-I error and maintains high testing power under the alternative hypothesis, even in the presence of high-dimensional conditioning sets. In addition, we further integrate FMCIT into a two-stage guided PC skeleton learning framework, termed GPC-FMCIT, which combines fast screening with guided, budgeted refinement using FMCIT. This design yields explicit bounds on the number of CI queries while maintaining high statistical power. Experiments on synthetic and real-world causal discovery tasks demonstrate favorable accuracy-efficiency trade-offs over existing CI testing methods and PC variants.
Paper Structure (32 sections, 18 equations, 1 figure, 3 tables, 4 algorithms)