KL-BSS: Rethinking optimality for neighbourhood selection in structural equation models
Ming Gao, Wai Ming Tai, Bryon Aragam
TL;DR
This work tackles neighbourhood selection in linear SEMs, where dependence among covariates challenges standard support-recovery methods like BSS and the Lasso. It introduces KL-BSS, a KL-divergence inspired estimator that augments the BSS framework with a beta-min constrained score, enabling it to exploit unknown SEM structure. The authors establish pointwise and minimax sample complexities via eigenvalues $\\lambda_K(\\Sigma)$ and $\\lambda_B(\\Sigma)$, showing KL-BSS achieves strictly better performance on designs in $\\Omega_\\Delta$ and is minimax-optimal over a broad class. They also provide practical MIP implementations and extensions to unknown sparsity and $\\beta_{\\min}$, with extensive simulations and a pan-cancer data application demonstrating improvements in both recovery and downstream prediction. Overall, KL-BSS advances neighbourhood selection in SEMs by leveraging latent structure even when unknown, with broad implications for causal structure learning.
Abstract
We introduce a new method for neighbourhood selection in linear structural equation models that improves over classical methods such as best subset selection (BSS) and the Lasso. Our method, called KL-BSS, takes advantage of the existence of underlying structure in SEM -- even when this structure is unknown -- and is easily implemented using existing solvers. Under weaker eigenvalue conditions compared to BSS and the Lasso, KL-BSS can provably recover the support of linear models with fewer samples. We establish both the pointwise and minimax sample complexity for recovery, which KL-BSS obtains. Extensive experiments on both real and simulated data confirm the improvements offered by KL-BSS. While it is well-known that the Lasso encounters difficulties under structured dependencies, it is less well-known that even BSS runs into trouble as well, and can be substantially improved. These results have implications for structure learning in graphical models, which often relies on neighbourhood selection as a subroutine.
