Optimal Kernel Choice for Score Function-based Causal Discovery
Wenjie Wang, Biwei Huang, Feng Liu, Xinge You, Tongliang Liu, Kun Zhang, Mingming Gong
TL;DR
This work tackles the kernel-parameter selection problem in RKHS-based score functions for causal discovery. It introduces a mutual-information-based objective that treats the causal relation as a mixture of independent noises and uses a Gaussian process prior to model the nonlinear mapping, maximizing the joint marginal likelihood $p(X, PA)$ to automatically learn kernel parameters. The authors prove local consistency of the resulting score and demonstrate, through synthetic and real benchmarks, that automatic kernel learning outperforms median-heuristic kernel choices and prior RKHS-based scores, particularly in dense graphs. The approach yields more accurate causal graphs from observational data and reduces reliance on manual, heuristic kernel tuning, offering practical benefits for scalable causal discovery in diverse data regimes.
Abstract
Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative process of the variables involved in each step of the causal graph search procedure as a mixture of independent noise variables. Based on this model, we derive an automatic kernel selection method by maximizing the marginal likelihood of the variables involved in each search step. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms heuristic kernel selection methods.
