IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery

Ivaxi Sheth; Zhijing Jin; Bryan Wilder; Dominik Janzing; Mario Fritz

IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery

Ivaxi Sheth, Zhijing Jin, Bryan Wilder, Dominik Janzing, Mario Fritz

TL;DR

This work investigates whether large language models can aid instrumental variable discovery under endogeneity by proposing IV Co-Scientist, a multi-agent framework that generates, critiques, and grounds candidate IVs for a treatment–outcome pair. The approach couples LLM-driven hypothesis generation with CriticAgent-based validation and a Grounder to map IVs to observable proxies, augmented by two evaluation axes: canonical-IV recovery and avoidance of invalid instruments, plus a novel consistency metric for internal validity in the absence of ground truth. Using Gapminder data, the study demonstrates that certain LLMs can recover literature-based instruments with high semantic alignment and that the CriticAgents effectively filter out invalid options, supporting the potential of LLMs as co-scientists in causal discovery. The results highlight a practical pathway to augment human causal reasoning with automated, context-aware hypothesis generation while outlining limitations related to grounding, generalizability, and reliance on domain knowledge. Overall, the paper advances principled, scalable early-stage IV discovery in high-dimensional observational data by integrating structured reasoning, statistical checks, and grounding steps.

Abstract

In the presence of confounding between an endogenous variable and the outcome, instrumental variables (IVs) are used to isolate the causal effect of the endogenous variable. Identifying valid instruments requires interdisciplinary knowledge, creativity, and contextual understanding, making it a non-trivial task. In this paper, we investigate whether large language models (LLMs) can aid in this task. We perform a two-stage evaluation framework. First, we test whether LLMs can recover well-established instruments from the literature, assessing their ability to replicate standard reasoning. Second, we evaluate whether LLMs can identify and avoid instruments that have been empirically or theoretically discredited. Building on these results, we introduce IV Co-Scientist, a multi-agent system that proposes, critiques, and refines IVs for a given treatment-outcome pair. We also introduce a statistical test to contextualize consistency in the absence of ground truth. Our results show the potential of LLMs to discover valid instrumental variables from a large observational database.

IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery

TL;DR

Abstract

IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery

Authors

TL;DR

Abstract

Table of Contents

Figures (2)