Learning to Defer for Causal Discovery with Imperfect Experts
Oscar Clivio, Divyat Mahajan, Perouz Taslakian, Sara Magliacane, Ioannis Mitliagkas, Valentina Zantedeschi, Alexandre Drouin
TL;DR
This work tackles the problem of incorporating imperfect expert guidance into causal discovery by learning when to defer between an expert (e.g., an LLM) and a data-driven CD method for pairwise causal direction. The authors adapt learning-to-defer to build L2D-CD, a deferral framework that uses textual context and numerical data to predict the correct causal direction between two variables, reducing to standard classification in the single-expert case. The approach is empirically validated on the Tübingen pairs, showing consistent improvements over both expert and CD baselines across multiple CD methods and expert types, and it can identify domains where the expert is strong or weak. They also outline a path to extend the method to graphs with three or more variables via ranking-from-pairwise-comparisons, enabling broader causal graph discovery with imperfect knowledge. Overall, L2D-CD provides a principled, transferable way to fuse domain knowledge with statistical causality while preserving robustness to expert imperfections.
Abstract
Integrating expert knowledge, e.g. from large language models, into causal discovery algorithms can be challenging when the knowledge is not guaranteed to be correct. Expert recommendations may contradict data-driven results, and their reliability can vary significantly depending on the domain or specific query. Existing methods based on soft constraints or inconsistencies in predicted causal relationships fail to account for these variations in expertise. To remedy this, we propose L2D-CD, a method for gauging the correctness of expert recommendations and optimally combining them with data-driven causal discovery results. By adapting learning-to-defer (L2D) algorithms for pairwise causal discovery (CD), we learn a deferral function that selects whether to rely on classical causal discovery methods using numerical data or expert recommendations based on textual meta-data. We evaluate L2D-CD on the canonical Tübingen pairs dataset and demonstrate its superior performance compared to both the causal discovery method and the expert used in isolation. Moreover, our approach identifies domains where the expert's performance is strong or weak. Finally, we outline a strategy for generalizing this approach to causal discovery on graphs with more than two variables, paving the way for further research in this area.
